The State of MCP observability: Observable.tools — Alex Volkov and Benjamin Eckel, W&B and Dylibso

00:00:00.000 | Alex Volkov: Hey, folks. My name is Alex Volkov. I'm an AI evangelist with Weights and Biases.

00:00:20.000 | Benjamin Echol: I'm Benjamin Echol. I'm co-founder and CTO of Dialypso. We're creators of mcp.run.

00:00:25.000 | Benjamin Echol: All right. And we're here to talk to you about mcp observability. Hey, Ben. I want to ask you a question.

00:00:32.000 | Benjamin Echol: Yeah, sure.

00:00:33.000 | Benjamin Echol: Somebody who worked at Datadog before and somebody who runs multiple mcp servers and clients on production,

00:00:39.000 | something that happened, advice that happened, something in my agent in production the other day.

00:00:44.000 | Benjamin Echol: Okay. Yeah. I mean, we've been running mcp clients and servers in production since the beginning.

00:00:50.000 | Benjamin Echol: Yeah, but wait. Aren't you like working at an observability company, Weights and Biases?

00:00:55.000 | Benjamin Echol: And don't you work on, like, what's it called? Weave?

00:00:57.000 | Benjamin Echol: Yep. That's true. I work on Weave. But since I started adding some powers to my agent via mcp,

00:01:04.000 | all that observability that I'm used to from just having my own code run end to end has gone a little bit dark.

00:01:10.000 | Benjamin Echol: Gotcha.

00:01:11.000 | Benjamin Echol: So this is what we're here to talk to you guys about. The rise of mcp is creating an observability blind spot.

00:01:17.000 | Benjamin Echol: As AI agents become more prevalent, the problem can compound with more and more tools via mcps,

00:01:23.000 | the less the developers can know about the end-to-end happenings within their agent.

00:01:28.000 | Benjamin Echol: Yeah. Yeah. So on mcp run, we're running both clients and servers.

00:01:32.000 | Benjamin Echol: And because it's a new ecosystem, we've had to, like, cobble together a lot of our own ways to do observability.

00:01:38.000 | Benjamin Echol: And I've been looking around, and it seems like everyone is sort of doing this in isolation,

00:01:42.000 | as we're solving the same problems. So, you know, we wanted to bring the community together on this issue,

00:01:48.000 | and so today we're going to talk about the state of observability in the mcp ecosystem.

00:01:53.000 | Benjamin Echol: Yep. So why do we care about this, and why do we think that you guys should care about this?

00:01:57.000 | Benjamin Echol: So if you don't have the ability to quickly understand why things went wrong on production,

00:02:01.000 | where they went wrong, and how, your ability to quickly respond is greatly diminished.

00:02:06.000 | Benjamin Echol: And we care deeply about -- we both build tools that need mcp observability,

00:02:11.000 | and we support mcp, and we both care deeply about developer experience as well.

00:02:16.000 | Benjamin Echol: Yeah. It's really important to me because enterprise engineering teams

00:02:19.000 | don't ship something to production unless they know for sure that they're going to be able to identify

00:02:23.000 | security and reliability problems before their customers do.

00:02:27.000 | Benjamin Echol: And that's why they invest a ton of money in observability platforms.

00:02:30.000 | Benjamin Echol: And so if you're going to ship mcp to these production environments,

00:02:34.000 | you must seamlessly integrate with these observability platforms.

00:02:38.000 | Benjamin Echol: Yep. So because we care deeply about developer experience at WNB Weave,

00:02:44.000 | Benjamin Echol: I'm happy to announce here on stage that Weave supports MCP. Yay!

00:02:48.000 | Benjamin Echol: As long as you're a developer of both the client and the server,

00:02:51.000 | Benjamin Echol: all you need to do is set this MCP trace list operation environment variable

00:02:55.000 | Benjamin Echol: on your client and server, and we'll show you the list tool calls,

00:02:59.000 | Benjamin Echol: and we'll show you the duration of your MCP calls.

00:03:02.000 | Benjamin Echol: This works currently with our Python-based clients, and this is how it looks.

00:03:06.000 | Benjamin Echol: Super quick. With the red arrows, you can see the client traces, for example,

00:03:10.000 | Benjamin Echol: and with the blue arrows, you can see we're pointing to the Calculate BMI tool

00:03:15.000 | Benjamin Echol: and the other tool, and that's it. Observability solved, right?

00:03:18.000 | Benjamin Echol: Yeah.

00:03:19.000 | Benjamin Echol: Let's get off the stage. We're done.

00:03:20.000 | Benjamin Echol: No, wait a second.

00:03:21.000 | Benjamin Echol: So what about this Calculate BMI tool, this MCP server? Why can't I see into that?

00:03:27.000 | Benjamin Echol: Yeah, we're working on this.

00:03:30.000 | Benjamin Echol: Yeah, also this seems like this is specific to Weave, right?

00:03:34.000 | Benjamin Echol: Is there not like a vendor-neutral way to do this and standardize?

00:03:37.000 | Benjamin Echol: Yeah, that's right.

00:03:38.000 | Benjamin Echol: This is a bespoke integration that we built into Weave into our SDKs in Python,

00:03:42.000 | Benjamin Echol: and while working on this, while our developers have been building this, like,

00:03:46.000 | Benjamin Echol: integration within our MCP tooling, I was advocating internally and externally

00:03:51.000 | Benjamin Echol: that we should align with the open nature of MCP as a concept

00:03:54.000 | Benjamin Echol: and created observable.tools. Maybe some of you have seen this.

00:03:57.000 | Benjamin Echol: This is a manifesto to drive a conversation that this is a problem that needs solving.

00:04:01.000 | Benjamin Echol: And between observability providers such as us and other folks that's been on stage before

00:04:07.000 | Benjamin Echol: and going to be on the evil track tomorrow, to do observability in a vendor-neutral

00:04:11.000 | Benjamin Echol: and standardized way.

00:04:12.000 | Benjamin Echol: And so while working on observable tools, I realized-- I did some search-- realized

00:04:16.000 | Benjamin Echol: that a vendor-neutral, scalable way to add observability exists.

00:04:20.000 | Benjamin Echol: And there could be a great way to marry the two open protocols to work together.

00:04:24.000 | Benjamin Echol: Yeah, exactly.

00:04:25.000 | Benjamin Echol: Fortunately, MCP-powered agents are really just another distributed system, and we've been doing that for decades.

00:04:31.000 | Benjamin Echol: So open telemetry is just the way that we've settled on doing that.

00:04:36.000 | Benjamin Echol: We're going to talk about OTEL a little bit.

00:04:38.000 | Benjamin Echol: If you're not familiar with it, we need to learn about a few primitives first.

00:04:42.000 | Benjamin Echol: So the main primitive that we need to learn about is the trace.

00:04:46.000 | Benjamin Echol: So a trace is kind of like an atomic operation in your system.

00:04:49.000 | Benjamin Echol: It's made up of a tree-like structure of steps that we call spans.

00:04:53.000 | Benjamin Echol: And a span represents the duration and some arbitrary metadata for each step.

00:04:58.000 | Benjamin Echol: And what this step is exactly is completely up to you to define.

00:05:01.000 | Benjamin Echol: It can be as high-level as like an HTTP request.

00:05:04.000 | Benjamin Echol: It can be as low-level as a tiny little function call.

00:05:06.000 | Benjamin Echol: Here's an example of like a checkout experience, an API for a checkout.

00:05:11.000 | Benjamin Echol: The size and position of each of these spans correspond to how long it took

00:05:15.000 | Benjamin Echol: and where it sits in the call graph, respectively.

00:05:17.000 | Benjamin Echol: And just from this data, you can tell a lot about a system and how to observe it.

00:05:21.000 | Benjamin Echol: The other primitive you need to be aware of is syncs.

00:05:25.000 | Benjamin Echol: So a sync is kind of like a centralized database where all your telemetry goes.

00:05:30.000 | Benjamin Echol: But often they come in the form of this whole platform with like a UI and dashboards

00:05:34.000 | Benjamin Echol: and alerting and monitoring and all those things.

00:05:37.000 | Benjamin Echol: So there's a lot of logos here, Ben.

00:05:39.000 | Benjamin Echol: Yeah.

00:05:40.000 | Benjamin Echol: Basically a sync is an open standard way for folks like collectors to receive those spans.

00:05:45.000 | Benjamin Echol: As long as the developer instrumented their application code in a certain standard

00:05:50.000 | Benjamin Echol: spec way, everybody can just receive those in the same unified way, right?

00:05:53.000 | Benjamin Echol: Exactly.

00:05:54.000 | Benjamin Echol: Yeah.

00:05:55.000 | Benjamin Echol: If you squint, it's just kind of like a bunch of databases that all support

00:05:57.000 | Benjamin Echol: the same schema and wired protocol.

00:05:59.000 | Benjamin Echol: And in fact--

00:06:00.000 | Benjamin Echol: You can switch them out.

00:06:01.000 | Benjamin Echol: And in fact, they don't have to change much of their code or even change the code

00:06:03.000 | Benjamin Echol: No.

00:06:04.000 | Benjamin Echol: It could be just config, right?

00:06:05.000 | Benjamin Echol: Right.

00:06:06.000 | Benjamin Echol: By the way, LM observability tools like WNBweave and some friends,

00:06:09.000 | Benjamin Echol: Simon from here before and some other friends, all have switched to support

00:06:13.000 | O-Tel as well, OpenTelemetry, is becoming like this global standard.

00:06:16.000 | Benjamin Echol: Great.

00:06:17.000 | Benjamin Echol: Yeah, another great thing about having a centralized sync is the last concept,

00:06:22.000 | distributed tracing.

00:06:23.000 | Benjamin Echol: So going back to our checkout endpoint, if the fraud service sends its span to

00:06:28.000 | the same sync, then we can stitch back together the traces and show the whole context.

00:06:33.000 | Benjamin Echol: So maybe you're kind of seeing where the MCP server stuff comes in here.

00:06:36.000 | Benjamin Echol: Yeah.

00:06:37.000 | Benjamin Echol: So, hey Ben, if it's possible via the integration to the open protocol, what

00:06:42.000 | if I want to use MCP servers that other people host, like GitHub, like Stripe, like other folks?

00:06:47.000 | Benjamin Echol: Yeah, that's a good question.

00:06:48.000 | Benjamin Echol: So with MCP-enabled agents, or really just any distributed system, there are

00:06:53.000 | kind of two scenarios.

00:06:54.000 | There's when the client and server are in different domains, and then there's when they're

00:06:59.000 | in the same domain.

00:07:00.000 | And by domain here, I don't necessarily mean the literal definition.

00:07:03.000 | Benjamin Echol: I just mean like the administrative domain of control, right?

00:07:06.000 | Benjamin Echol: Like do you own this MCP server?

00:07:08.000 | Benjamin Echol: Do you own this MCP client?

00:07:09.000 | Benjamin Echol: Or is it a third party thing?

00:07:11.000 | Benjamin Echol: So your GitHub Stripe example is like a great example of like the different

00:07:16.000 | domain scenario.

00:07:17.000 | Benjamin Echol: So this is a trace of an agent that is executing the prompt, read and summarize

00:07:23.000 | the top article on Hacker News.

00:07:25.000 | Benjamin Echol: So it's going to reach out to this like remote fetch server to read Hacker News.

00:07:29.000 | Benjamin Echol: But it appears to us that the trace is a single service span because it runs

00:07:33.000 | outside of our domain of control.

00:07:34.000 | Benjamin Echol: So it appears to a black box to us.

00:07:38.000 | Benjamin Echol: But suppose we do own the server.

00:07:40.000 | Benjamin Echol: Like maybe it's running in a different data center than the client.

00:07:44.000 | Benjamin Echol: How do we get actually the whole context?

00:07:46.000 | Benjamin Echol: It's pretty simple.

00:07:47.000 | Benjamin Echol: So with distributed tracing and context propagation, we can have the remote fetch

00:07:52.000 | server, send its spans to the same sync as the client.

00:07:55.000 | Benjamin Echol: And the sync will just stitch together the missing parts of the trace back

00:07:58.000 | for us.

00:07:59.000 | Benjamin Echol: So in this graphic, you can see that we can now break into that fetch server.

00:08:02.000 | Benjamin Echol: And we can see what it's doing.

00:08:04.000 | Benjamin Echol: It's making some HTTP request that's taking roughly 350 milliseconds and then

00:08:08.000 | it's doing a little crunching to create some markdown.

00:08:12.000 | Benjamin Echol: Okay.

00:08:14.000 | Benjamin Echol: Okay.

00:08:15.000 | Benjamin Echol: So that's great in theory.

00:08:16.000 | Benjamin Echol: And we ran through this.

00:08:17.000 | Benjamin Echol: We can have a whole hour talking about OTEL.

00:08:19.000 | Benjamin Echol: Not that we've got an hour.

00:08:20.000 | Benjamin Echol: But how do we can actually marry those two protocols together, right?

00:08:24.000 | Benjamin Echol: Is there a standard way?

00:08:25.000 | Benjamin Echol: Did the MCP spec folk deploy a way for us for observability?

00:08:29.000 | Benjamin Echol: Not quite.

00:08:30.000 | Benjamin Echol: It was pretty tricky to get working.

00:08:33.000 | Benjamin Echol: It does work today, but it required a little bit more work than it should

00:08:38.000 | have.

00:08:39.000 | Benjamin Echol: So in order to do this, we need to, as I said, propagate the trace context from

00:08:42.000 | Benjamin Echol: From the client to the server.

00:08:44.000 | Benjamin Echol: So here's a TypeScript example.

00:08:46.000 | Benjamin Echol: And when we call a tool in the client, we're going to extract our current

00:08:50.300 | span.

00:08:51.300 | Benjamin Echol: And we're going to pass it along to the server.

00:08:54.880 | Benjamin Echol: And we achieve this by basically just shuttling the data through the protocol's

00:08:58.800 | meta payload.

00:09:00.260 | Benjamin Echol: And now that we're inside the server, this would be like in the fetch server,

00:09:06.320 | Benjamin Echol: We can pull that trace context out, inherit it as our current span.

00:09:11.320 | Benjamin Echol: And then when we send our spans off to the sink, it's as if it came from that

00:09:16.120 | parent span.

00:09:17.120 | Benjamin Echol: And the sink can stitch it back together.

00:09:18.700 | Benjamin Echol: And this is awesome.

00:09:20.200 | Benjamin Echol: So you basically used an undocumented kind of property of sending the payload together

00:09:25.280 | with the payload between clients and servers to pass along the data that OTL needs to connect

00:09:30.740 | those things together, right?

00:09:31.740 | Benjamin Echol: Yeah, sort of.

00:09:32.740 | Benjamin Echol: I just kind of had to abuse the lower-level interface reserved for the protocol.

00:09:36.620 | But higher-level ways should be provided through tooling, and that's something we should talk

00:09:40.460 | about a little bit later in the talk.

00:09:42.460 | Benjamin Echol: Yeah.

00:09:44.460 | Benjamin Echol: Oh yeah.

00:09:45.600 | Benjamin Echol: So by the way, this is not just a screenshot.

00:09:49.460 | This is a working demo, so it's a lot more code than what I showed in the slide.

00:09:54.540 | Benjamin Echol: So if you want to actually go see how this works and adapt this for your

00:09:57.120 | needs, go check out this GitHub link.

00:09:59.760 | And I think actually you did that to get it to work with Weave, right?

00:10:02.260 | Benjamin Echol: Yeah.

00:10:03.260 | So now that we know how to pass context after you showed me the way, let's see how amazing

00:10:07.840 | this solution actually is in practice.

00:10:09.880 | While Weave MCP, the thing I showed you guys before, was a bespoke solution baked into our

00:10:14.180 | Python SDK for Weave, the huge benefit of MCP, generally not only observability-related,

00:10:19.800 | is that servers and clients don't have to run on the same environment or share the same

00:10:23.800 | code or be from the same programming language.

00:10:26.380 | So while we were working on the Python SDK, you built an agent in TypeScript.

00:10:30.520 | And so because WNB Weave supports OTEL, open telemetry, and it's an open protocol, your TypeScript

00:10:37.380 | agent, it took me a few minutes without changing much code to just send those traces into Weave

00:10:42.940 | Weave from a TypeScript agent and not necessarily from a Python agent.

00:10:45.520 | So here you can see in the green, the client traces are in the green, and then the server

00:10:52.940 | traces actually show what happens within those calls on kind of the server side as well.

00:10:59.520 | Yeah, it's really cool.

00:11:01.100 | So how did you actually get the traces into Weave?

00:11:03.100 | So this is very, very simple.

00:11:05.100 | Way simpler than before.

00:11:06.240 | We just defined WNB Weave as the OTLP endpoint, a standard that you kind of showed me around.

00:11:12.100 | And then folks can send their traces into 1b.ai/otel.

00:11:15.780 | And all you need to do in addition to this is authorize.

00:11:18.240 | So add authorization headers, and specify which project you want to go into.

00:11:22.580 | Cool.

00:11:23.580 | Yep.

00:11:24.580 | So while we talked to you about the userability, while I was working on this, I had a magic moment

00:11:28.000 | happening with MCP.

00:11:29.000 | I wanted to share this with everybody.

00:11:30.000 | Yeah, okay.

00:11:31.000 | I love that MCP story.

00:11:32.000 | Yeah.

00:11:33.000 | So I used Cloud Opus 4 that just came out to Weaveify your agent that you built and to add

00:11:40.360 | this MCP userability, and WNB Weave is going to get a little meta, stay with us, also has

00:11:46.480 | an MCP server.

00:11:47.480 | Okay.

00:11:48.480 | What does it do?

00:11:49.480 | So we have an MCP server that lets your agents or chats, et cetera, talk to your traces and

00:11:53.600 | see the data and summarize the data for you.

00:11:55.660 | Okay.

00:11:56.660 | So we have this MCP.

00:11:57.660 | It's been configured in my Winsurf, and Cloud Opus 4 was able to use this MCP server to kind

00:12:04.360 | of work through it.

00:12:05.360 | So here you see an example.

00:12:07.280 | The agent basically started working on your code, and then decided, okay, I'm going to run

00:12:12.520 | the code, and then said, okay, I'm going to go and actually see if the traces showed up

00:12:16.720 | at WNB Weave.

00:12:17.720 | Then it noticed that they showed up, but they showed up incorrectly.

00:12:21.060 | So some input or output, a specific parameter that it needed to do, it didn't know how to

00:12:24.900 | do.

00:12:25.900 | the documentation.

00:12:26.900 | And so the next moment just absolutely blew my mind.

00:12:31.140 | This Opus 4 discovered that our MCP server exposes a support bot.

00:12:36.840 | So essentially, another agent decided to write a query for it, received the right information

00:12:43.220 | after a while, and acted upon this information, learned how to fix the thing that it needed to

00:12:47.280 | fix fixed it, and then went back to notice whether or not the fix was correct.

00:12:52.620 | So my coding agent talked to another agent via support via MCP that it discovered on its own.

00:12:59.000 | I didn't even know that this ability exists to work on your coding agent in things.

00:13:04.000 | The things got a little bit meta, and my head was like absurd.

00:13:06.000 | I was sitting like this while all this happened.

00:13:08.000 | Didn't touch the keyboard once.

00:13:09.000 | That's awesome.

00:13:10.000 | Yeah.

00:13:11.000 | It's pretty meta.

00:13:12.000 | Yeah.

00:13:13.000 | Before we go, I also wanted to take a moment to have an announcement.

00:13:17.000 | So MCP run will also be exporting telemetry to OTEL-compatible syncs.

00:13:23.000 | So as I mentioned before, we run both servers and clients.

00:13:27.000 | So for servers, we have this concept called profiles.

00:13:31.000 | And these allow you to slice and dice multiple MCP servers into one single virtual server.

00:13:38.000 | And we also have an MCP client called task.

00:13:42.000 | And this is like a single prompt agent that could be triggered via a URL or a schedule.

00:13:47.000 | And it also just sort of marries with the idea of profiles.

00:13:50.000 | But yeah, soon you'll be able to get OTEL out of both of these.

00:13:53.000 | And hopefully, you know, we'll connect up to weights and biases and have a little party.

00:13:57.000 | Yeah.

00:13:58.000 | You can send those to we've traded from mcp.run.

00:14:01.000 | Okay.

00:14:02.000 | So to recap, observability is here in MCP today, but it's not evenly distributed.

00:14:07.000 | It's not evenly distributed.

00:14:09.000 | OTEL should get you most of the way there.

00:14:11.000 | But the community needs to come together creating tooling and conventions to make it smoother.

00:14:19.000 | You should need to be an expert in observability to get this stuff working.

00:14:23.000 | So how do you get involved?

00:14:25.000 | Well, AI engineers just start thinking about observability via MCP tooling and whether or not you're getting observability to the end-to-end of your execution chain.

00:14:35.000 | For tool builders and platform providers, we should join and work on higher-level SDKs.

00:14:41.000 | So Arise's open inference, for example, is a great start.

00:14:45.000 | But all of us should help with the instrumentation for our clients who use bespoke SDKs to work on conventions also together.

00:14:51.000 | Ben, can you explain semantic conventions super quick?

00:14:53.000 | Yeah, sure.

00:14:54.000 | So as we learned earlier, spans, they carry user-defined attributes, right?

00:15:00.000 | So if they're user-defined, how does the sync know that a span is actually, say, an HTTP request with a 200 status code?

00:15:07.000 | Or how does it know that it's an MCP tool call that has an error?

00:15:12.000 | That's where semantic conventions come in.

00:15:15.000 | And you can be a part of defining what the conventions are for agents that all observability platforms agree on.

00:15:21.000 | And if you're interested in this, I would suggest going to check out the Gen.AI semantic conventions effort by the OTEL team.

00:15:29.000 | And yeah, lastly, for platform builders such as MCP Run, you know, go add OTEL support, help review RFCs,

00:15:37.000 | and finally, yeah, just come like talk to us about ideas because we're just, everything's just kind of coming together.

00:15:43.000 | Everything's so new and fresh and we don't really know exactly what to do.

00:15:46.000 | There's an additional track here at AI Engineer.

00:15:50.000 | This is called the hallway track.

00:15:51.000 | And I've learned more about the stuff that we were talking about out there by actually talking to people who implement this

00:15:56.000 | than I learned while preparing before the talk.

00:15:59.000 | It's quite incredible.

00:16:00.000 | So, Ben?

00:16:02.000 | Yeah, sure.

00:16:03.000 | Yeah, so again, I'm Ben.

00:16:06.000 | And my call to action here would just be go check out MCP Run.

00:16:10.000 | You can get a free account.

00:16:11.000 | Try it out.

00:16:12.000 | Yeah, that's it.

00:16:13.000 | And I'm Alex.

00:16:14.000 | Check out WMB Weave MCP OP to learn how to trace MCP with OTEL.

00:16:21.000 | I'm also, I did the observable tools initiative.

00:16:23.000 | I would love for you to check out the manifesto to see if this resonates with you, to join forces to talk about observability.

00:16:29.000 | And, yeah, please visit us at the booth.

00:16:32.000 | We have some very interesting surprises for you.

00:16:34.000 | We have a robotic dog right here that's observable.

00:16:36.000 | I also run the Thursday AI podcast.

00:16:38.000 | I want to send Swix a huge, huge shout out for giving me the support to show up here.

00:16:44.000 | And if you guys are interested in AI news, we're going to record an episode tomorrow.

00:16:48.000 | That's it.

00:16:49.000 | Thank you so much.

00:16:50.000 | We'll see you next time.

00:16:51.000 | Bye.

00:16:51.000 | We'll see you next time.