back to indexThe State of MCP observability: Observable.tools — Alex Volkov and Benjamin Eckel, W&B and Dylibso

00:00:00.000 |
Alex Volkov: Hey, folks. My name is Alex Volkov. I'm an AI evangelist with Weights and Biases. 00:00:20.000 |
Benjamin Echol: I'm Benjamin Echol. I'm co-founder and CTO of Dialypso. We're creators of mcp.run. 00:00:25.000 |
Benjamin Echol: All right. And we're here to talk to you about mcp observability. Hey, Ben. I want to ask you a question. 00:00:33.000 |
Benjamin Echol: Somebody who worked at Datadog before and somebody who runs multiple mcp servers and clients on production, 00:00:39.000 |
something that happened, advice that happened, something in my agent in production the other day. 00:00:44.000 |
Benjamin Echol: Okay. Yeah. I mean, we've been running mcp clients and servers in production since the beginning. 00:00:50.000 |
Benjamin Echol: Yeah, but wait. Aren't you like working at an observability company, Weights and Biases? 00:00:55.000 |
Benjamin Echol: And don't you work on, like, what's it called? Weave? 00:00:57.000 |
Benjamin Echol: Yep. That's true. I work on Weave. But since I started adding some powers to my agent via mcp, 00:01:04.000 |
all that observability that I'm used to from just having my own code run end to end has gone a little bit dark. 00:01:11.000 |
Benjamin Echol: So this is what we're here to talk to you guys about. The rise of mcp is creating an observability blind spot. 00:01:17.000 |
Benjamin Echol: As AI agents become more prevalent, the problem can compound with more and more tools via mcps, 00:01:23.000 |
the less the developers can know about the end-to-end happenings within their agent. 00:01:28.000 |
Benjamin Echol: Yeah. Yeah. So on mcp run, we're running both clients and servers. 00:01:32.000 |
Benjamin Echol: And because it's a new ecosystem, we've had to, like, cobble together a lot of our own ways to do observability. 00:01:38.000 |
Benjamin Echol: And I've been looking around, and it seems like everyone is sort of doing this in isolation, 00:01:42.000 |
as we're solving the same problems. So, you know, we wanted to bring the community together on this issue, 00:01:48.000 |
and so today we're going to talk about the state of observability in the mcp ecosystem. 00:01:53.000 |
Benjamin Echol: Yep. So why do we care about this, and why do we think that you guys should care about this? 00:01:57.000 |
Benjamin Echol: So if you don't have the ability to quickly understand why things went wrong on production, 00:02:01.000 |
where they went wrong, and how, your ability to quickly respond is greatly diminished. 00:02:06.000 |
Benjamin Echol: And we care deeply about -- we both build tools that need mcp observability, 00:02:11.000 |
and we support mcp, and we both care deeply about developer experience as well. 00:02:16.000 |
Benjamin Echol: Yeah. It's really important to me because enterprise engineering teams 00:02:19.000 |
don't ship something to production unless they know for sure that they're going to be able to identify 00:02:23.000 |
security and reliability problems before their customers do. 00:02:27.000 |
Benjamin Echol: And that's why they invest a ton of money in observability platforms. 00:02:30.000 |
Benjamin Echol: And so if you're going to ship mcp to these production environments, 00:02:34.000 |
you must seamlessly integrate with these observability platforms. 00:02:38.000 |
Benjamin Echol: Yep. So because we care deeply about developer experience at WNB Weave, 00:02:44.000 |
Benjamin Echol: I'm happy to announce here on stage that Weave supports MCP. Yay! 00:02:48.000 |
Benjamin Echol: As long as you're a developer of both the client and the server, 00:02:51.000 |
Benjamin Echol: all you need to do is set this MCP trace list operation environment variable 00:02:55.000 |
Benjamin Echol: on your client and server, and we'll show you the list tool calls, 00:02:59.000 |
Benjamin Echol: and we'll show you the duration of your MCP calls. 00:03:02.000 |
Benjamin Echol: This works currently with our Python-based clients, and this is how it looks. 00:03:06.000 |
Benjamin Echol: Super quick. With the red arrows, you can see the client traces, for example, 00:03:10.000 |
Benjamin Echol: and with the blue arrows, you can see we're pointing to the Calculate BMI tool 00:03:15.000 |
Benjamin Echol: and the other tool, and that's it. Observability solved, right? 00:03:19.000 |
Benjamin Echol: Let's get off the stage. We're done. 00:03:21.000 |
Benjamin Echol: So what about this Calculate BMI tool, this MCP server? Why can't I see into that? 00:03:30.000 |
Benjamin Echol: Yeah, also this seems like this is specific to Weave, right? 00:03:34.000 |
Benjamin Echol: Is there not like a vendor-neutral way to do this and standardize? 00:03:38.000 |
Benjamin Echol: This is a bespoke integration that we built into Weave into our SDKs in Python, 00:03:42.000 |
Benjamin Echol: and while working on this, while our developers have been building this, like, 00:03:46.000 |
Benjamin Echol: integration within our MCP tooling, I was advocating internally and externally 00:03:51.000 |
Benjamin Echol: that we should align with the open nature of MCP as a concept 00:03:54.000 |
Benjamin Echol: and created observable.tools. Maybe some of you have seen this. 00:03:57.000 |
Benjamin Echol: This is a manifesto to drive a conversation that this is a problem that needs solving. 00:04:01.000 |
Benjamin Echol: And between observability providers such as us and other folks that's been on stage before 00:04:07.000 |
Benjamin Echol: and going to be on the evil track tomorrow, to do observability in a vendor-neutral 00:04:12.000 |
Benjamin Echol: And so while working on observable tools, I realized-- I did some search-- realized 00:04:16.000 |
Benjamin Echol: that a vendor-neutral, scalable way to add observability exists. 00:04:20.000 |
Benjamin Echol: And there could be a great way to marry the two open protocols to work together. 00:04:25.000 |
Benjamin Echol: Fortunately, MCP-powered agents are really just another distributed system, and we've been doing that for decades. 00:04:31.000 |
Benjamin Echol: So open telemetry is just the way that we've settled on doing that. 00:04:36.000 |
Benjamin Echol: We're going to talk about OTEL a little bit. 00:04:38.000 |
Benjamin Echol: If you're not familiar with it, we need to learn about a few primitives first. 00:04:42.000 |
Benjamin Echol: So the main primitive that we need to learn about is the trace. 00:04:46.000 |
Benjamin Echol: So a trace is kind of like an atomic operation in your system. 00:04:49.000 |
Benjamin Echol: It's made up of a tree-like structure of steps that we call spans. 00:04:53.000 |
Benjamin Echol: And a span represents the duration and some arbitrary metadata for each step. 00:04:58.000 |
Benjamin Echol: And what this step is exactly is completely up to you to define. 00:05:01.000 |
Benjamin Echol: It can be as high-level as like an HTTP request. 00:05:04.000 |
Benjamin Echol: It can be as low-level as a tiny little function call. 00:05:06.000 |
Benjamin Echol: Here's an example of like a checkout experience, an API for a checkout. 00:05:11.000 |
Benjamin Echol: The size and position of each of these spans correspond to how long it took 00:05:15.000 |
Benjamin Echol: and where it sits in the call graph, respectively. 00:05:17.000 |
Benjamin Echol: And just from this data, you can tell a lot about a system and how to observe it. 00:05:21.000 |
Benjamin Echol: The other primitive you need to be aware of is syncs. 00:05:25.000 |
Benjamin Echol: So a sync is kind of like a centralized database where all your telemetry goes. 00:05:30.000 |
Benjamin Echol: But often they come in the form of this whole platform with like a UI and dashboards 00:05:34.000 |
Benjamin Echol: and alerting and monitoring and all those things. 00:05:37.000 |
Benjamin Echol: So there's a lot of logos here, Ben. 00:05:40.000 |
Benjamin Echol: Basically a sync is an open standard way for folks like collectors to receive those spans. 00:05:45.000 |
Benjamin Echol: As long as the developer instrumented their application code in a certain standard 00:05:50.000 |
Benjamin Echol: spec way, everybody can just receive those in the same unified way, right? 00:05:55.000 |
Benjamin Echol: If you squint, it's just kind of like a bunch of databases that all support 00:05:57.000 |
Benjamin Echol: the same schema and wired protocol. 00:06:01.000 |
Benjamin Echol: And in fact, they don't have to change much of their code or even change the code 00:06:04.000 |
Benjamin Echol: It could be just config, right? 00:06:06.000 |
Benjamin Echol: By the way, LM observability tools like WNBweave and some friends, 00:06:09.000 |
Benjamin Echol: Simon from here before and some other friends, all have switched to support 00:06:13.000 |
O-Tel as well, OpenTelemetry, is becoming like this global standard. 00:06:17.000 |
Benjamin Echol: Yeah, another great thing about having a centralized sync is the last concept, 00:06:23.000 |
Benjamin Echol: So going back to our checkout endpoint, if the fraud service sends its span to 00:06:28.000 |
the same sync, then we can stitch back together the traces and show the whole context. 00:06:33.000 |
Benjamin Echol: So maybe you're kind of seeing where the MCP server stuff comes in here. 00:06:37.000 |
Benjamin Echol: So, hey Ben, if it's possible via the integration to the open protocol, what 00:06:42.000 |
if I want to use MCP servers that other people host, like GitHub, like Stripe, like other folks? 00:06:47.000 |
Benjamin Echol: Yeah, that's a good question. 00:06:48.000 |
Benjamin Echol: So with MCP-enabled agents, or really just any distributed system, there are 00:06:54.000 |
There's when the client and server are in different domains, and then there's when they're 00:07:00.000 |
And by domain here, I don't necessarily mean the literal definition. 00:07:03.000 |
Benjamin Echol: I just mean like the administrative domain of control, right? 00:07:06.000 |
Benjamin Echol: Like do you own this MCP server? 00:07:09.000 |
Benjamin Echol: Or is it a third party thing? 00:07:11.000 |
Benjamin Echol: So your GitHub Stripe example is like a great example of like the different 00:07:17.000 |
Benjamin Echol: So this is a trace of an agent that is executing the prompt, read and summarize 00:07:25.000 |
Benjamin Echol: So it's going to reach out to this like remote fetch server to read Hacker News. 00:07:29.000 |
Benjamin Echol: But it appears to us that the trace is a single service span because it runs 00:07:34.000 |
Benjamin Echol: So it appears to a black box to us. 00:07:38.000 |
Benjamin Echol: But suppose we do own the server. 00:07:40.000 |
Benjamin Echol: Like maybe it's running in a different data center than the client. 00:07:44.000 |
Benjamin Echol: How do we get actually the whole context? 00:07:47.000 |
Benjamin Echol: So with distributed tracing and context propagation, we can have the remote fetch 00:07:52.000 |
server, send its spans to the same sync as the client. 00:07:55.000 |
Benjamin Echol: And the sync will just stitch together the missing parts of the trace back 00:07:59.000 |
Benjamin Echol: So in this graphic, you can see that we can now break into that fetch server. 00:08:02.000 |
Benjamin Echol: And we can see what it's doing. 00:08:04.000 |
Benjamin Echol: It's making some HTTP request that's taking roughly 350 milliseconds and then 00:08:08.000 |
it's doing a little crunching to create some markdown. 00:08:17.000 |
Benjamin Echol: We can have a whole hour talking about OTEL. 00:08:20.000 |
Benjamin Echol: But how do we can actually marry those two protocols together, right? 00:08:25.000 |
Benjamin Echol: Did the MCP spec folk deploy a way for us for observability? 00:08:30.000 |
Benjamin Echol: It was pretty tricky to get working. 00:08:33.000 |
Benjamin Echol: It does work today, but it required a little bit more work than it should 00:08:39.000 |
Benjamin Echol: So in order to do this, we need to, as I said, propagate the trace context from 00:08:42.000 |
Benjamin Echol: From the client to the server. 00:08:44.000 |
Benjamin Echol: So here's a TypeScript example. 00:08:46.000 |
Benjamin Echol: And when we call a tool in the client, we're going to extract our current 00:08:51.300 |
Benjamin Echol: And we're going to pass it along to the server. 00:08:54.880 |
Benjamin Echol: And we achieve this by basically just shuttling the data through the protocol's 00:09:00.260 |
Benjamin Echol: And now that we're inside the server, this would be like in the fetch server, 00:09:06.320 |
Benjamin Echol: We can pull that trace context out, inherit it as our current span. 00:09:11.320 |
Benjamin Echol: And then when we send our spans off to the sink, it's as if it came from that 00:09:17.120 |
Benjamin Echol: And the sink can stitch it back together. 00:09:20.200 |
Benjamin Echol: So you basically used an undocumented kind of property of sending the payload together 00:09:25.280 |
with the payload between clients and servers to pass along the data that OTL needs to connect 00:09:32.740 |
Benjamin Echol: I just kind of had to abuse the lower-level interface reserved for the protocol. 00:09:36.620 |
But higher-level ways should be provided through tooling, and that's something we should talk 00:09:45.600 |
Benjamin Echol: So by the way, this is not just a screenshot. 00:09:49.460 |
This is a working demo, so it's a lot more code than what I showed in the slide. 00:09:54.540 |
Benjamin Echol: So if you want to actually go see how this works and adapt this for your 00:09:59.760 |
And I think actually you did that to get it to work with Weave, right? 00:10:03.260 |
So now that we know how to pass context after you showed me the way, let's see how amazing 00:10:09.880 |
While Weave MCP, the thing I showed you guys before, was a bespoke solution baked into our 00:10:14.180 |
Python SDK for Weave, the huge benefit of MCP, generally not only observability-related, 00:10:19.800 |
is that servers and clients don't have to run on the same environment or share the same 00:10:23.800 |
code or be from the same programming language. 00:10:26.380 |
So while we were working on the Python SDK, you built an agent in TypeScript. 00:10:30.520 |
And so because WNB Weave supports OTEL, open telemetry, and it's an open protocol, your TypeScript 00:10:37.380 |
agent, it took me a few minutes without changing much code to just send those traces into Weave 00:10:42.940 |
Weave from a TypeScript agent and not necessarily from a Python agent. 00:10:45.520 |
So here you can see in the green, the client traces are in the green, and then the server 00:10:52.940 |
traces actually show what happens within those calls on kind of the server side as well. 00:11:01.100 |
So how did you actually get the traces into Weave? 00:11:06.240 |
We just defined WNB Weave as the OTLP endpoint, a standard that you kind of showed me around. 00:11:12.100 |
And then folks can send their traces into 1b.ai/otel. 00:11:15.780 |
And all you need to do in addition to this is authorize. 00:11:18.240 |
So add authorization headers, and specify which project you want to go into. 00:11:24.580 |
So while we talked to you about the userability, while I was working on this, I had a magic moment 00:11:33.000 |
So I used Cloud Opus 4 that just came out to Weaveify your agent that you built and to add 00:11:40.360 |
this MCP userability, and WNB Weave is going to get a little meta, stay with us, also has 00:11:49.480 |
So we have an MCP server that lets your agents or chats, et cetera, talk to your traces and 00:11:57.660 |
It's been configured in my Winsurf, and Cloud Opus 4 was able to use this MCP server to kind 00:12:07.280 |
The agent basically started working on your code, and then decided, okay, I'm going to run 00:12:12.520 |
the code, and then said, okay, I'm going to go and actually see if the traces showed up 00:12:17.720 |
Then it noticed that they showed up, but they showed up incorrectly. 00:12:21.060 |
So some input or output, a specific parameter that it needed to do, it didn't know how to 00:12:26.900 |
And so the next moment just absolutely blew my mind. 00:12:31.140 |
This Opus 4 discovered that our MCP server exposes a support bot. 00:12:36.840 |
So essentially, another agent decided to write a query for it, received the right information 00:12:43.220 |
after a while, and acted upon this information, learned how to fix the thing that it needed to 00:12:47.280 |
fix fixed it, and then went back to notice whether or not the fix was correct. 00:12:52.620 |
So my coding agent talked to another agent via support via MCP that it discovered on its own. 00:12:59.000 |
I didn't even know that this ability exists to work on your coding agent in things. 00:13:04.000 |
The things got a little bit meta, and my head was like absurd. 00:13:06.000 |
I was sitting like this while all this happened. 00:13:13.000 |
Before we go, I also wanted to take a moment to have an announcement. 00:13:17.000 |
So MCP run will also be exporting telemetry to OTEL-compatible syncs. 00:13:23.000 |
So as I mentioned before, we run both servers and clients. 00:13:27.000 |
So for servers, we have this concept called profiles. 00:13:31.000 |
And these allow you to slice and dice multiple MCP servers into one single virtual server. 00:13:42.000 |
And this is like a single prompt agent that could be triggered via a URL or a schedule. 00:13:47.000 |
And it also just sort of marries with the idea of profiles. 00:13:50.000 |
But yeah, soon you'll be able to get OTEL out of both of these. 00:13:53.000 |
And hopefully, you know, we'll connect up to weights and biases and have a little party. 00:13:58.000 |
You can send those to we've traded from mcp.run. 00:14:02.000 |
So to recap, observability is here in MCP today, but it's not evenly distributed. 00:14:11.000 |
But the community needs to come together creating tooling and conventions to make it smoother. 00:14:19.000 |
You should need to be an expert in observability to get this stuff working. 00:14:25.000 |
Well, AI engineers just start thinking about observability via MCP tooling and whether or not you're getting observability to the end-to-end of your execution chain. 00:14:35.000 |
For tool builders and platform providers, we should join and work on higher-level SDKs. 00:14:41.000 |
So Arise's open inference, for example, is a great start. 00:14:45.000 |
But all of us should help with the instrumentation for our clients who use bespoke SDKs to work on conventions also together. 00:14:51.000 |
Ben, can you explain semantic conventions super quick? 00:14:54.000 |
So as we learned earlier, spans, they carry user-defined attributes, right? 00:15:00.000 |
So if they're user-defined, how does the sync know that a span is actually, say, an HTTP request with a 200 status code? 00:15:07.000 |
Or how does it know that it's an MCP tool call that has an error? 00:15:15.000 |
And you can be a part of defining what the conventions are for agents that all observability platforms agree on. 00:15:21.000 |
And if you're interested in this, I would suggest going to check out the Gen.AI semantic conventions effort by the OTEL team. 00:15:29.000 |
And yeah, lastly, for platform builders such as MCP Run, you know, go add OTEL support, help review RFCs, 00:15:37.000 |
and finally, yeah, just come like talk to us about ideas because we're just, everything's just kind of coming together. 00:15:43.000 |
Everything's so new and fresh and we don't really know exactly what to do. 00:15:46.000 |
There's an additional track here at AI Engineer. 00:15:51.000 |
And I've learned more about the stuff that we were talking about out there by actually talking to people who implement this 00:15:56.000 |
than I learned while preparing before the talk. 00:16:06.000 |
And my call to action here would just be go check out MCP Run. 00:16:14.000 |
Check out WMB Weave MCP OP to learn how to trace MCP with OTEL. 00:16:21.000 |
I'm also, I did the observable tools initiative. 00:16:23.000 |
I would love for you to check out the manifesto to see if this resonates with you, to join forces to talk about observability. 00:16:32.000 |
We have some very interesting surprises for you. 00:16:34.000 |
We have a robotic dog right here that's observable. 00:16:38.000 |
I want to send Swix a huge, huge shout out for giving me the support to show up here. 00:16:44.000 |
And if you guys are interested in AI news, we're going to record an episode tomorrow.