MCP at Sourcegraph | Code w/ Claude

. All right. How's everyone doing today? Cool. Awesome. So my name is Biong. I'm the CTO and co-founder of a company called Sourcegraph. There might be some of you who have used our products inside your development organizations. We build developer tools for professional engineers working in large, complex code bases.

And for those of you who haven't heard of us, we actually serve-- I think it's like seven of the top 10 software engineering companies by market cap, and six of the top 10 US banks, and just multitudes of companies building software and writing code across basically every industry vertical.

And today, I'm here to talk about our journey with MCP, and in particular, how we're integrating the model context protocol deeply into the fabric of our architecture. So our journey actually began quite some time ago. It was actually summer of last year when this fellow, David, at Anthropic-- maybe you've heard of him.

He's now one of the co-creators of MCP. He reached out and said, hey, I heard you guys are doing a lot with retrieval augmented code generation and fetching the appropriate context in the context window and getting models to perform better on coding and technical question answering. We're working on a thing that you might find interesting.

And we're like, huh, that sounds interesting. What is it? And he was like, it's kind of like LSP, but for model context. And we're like, wow, that does sound interesting. And so we started chatting, and we ended up becoming one of the early design partners for the MCP protocol and gave a lot of feedback.

And it was really a privilege to work with David and the team to kind of guide the evolution of that protocol. And in these conversations, as we're playing more around with the protocol and experimenting with tools in conjunction with the developer tools that we're building, we soon came to this realization that, holy crap, like AI is changing everything, but everything is about to change again.

And specifically, we felt that tool calling models in conjunction with MCP was going to lead to another paradigm shift in the standard AI application architecture. And I think we've been through, I would say, like three waves of AI application architecture so far. The first wave was sort of the co-pilot wave, where basically the architecture of those applications was dictated by the capabilities of the first LLMs that were pushed into production.

So if you go back to the ancient year of 2022 and remember what AI was like back then, all the models then hadn't yet been tuned to respond in a chat fashion or use tools. They were just these kind of like text completion models. And so all the big applications that you saw built on top of AI followed this paradigm.

You know, the human would type some stuff, and then the model would complete the next couple of tokens, and the human would type some more. And that was kind of the interaction paradigm. And then ChatGPT came along, and that ushered in a new modality. So everyone soon realized, wow, being able to chat with this thing and make explicit asks is really powerful.

And one of the things that we soon realized in that world was like, hey, if you copy and paste relevance host snippets into the context window and then ask it to answer the question, it gets a lot better in terms of quality and usefulness on production code bases. And that's what I like to call the rag chat era of AI.

And I think a lot of folks are still living in this world to a certain extent. But you all, the folks at this conference, I think we are all aware that we're kind of entering, have already entered into a new era, which is the era of agents. And just as the two generations before, this era is really being dictated by the capabilities at the model layer.

And so when we took a look at our tooling suite that we had built so far, the more we looked, the more we realized, my gosh, a lot of the underlying assumptions of building on top of LLMs have changed with tool calling agents and MCP. We might have to rethink this application from the ground up to build truly agentically.

And so that's essentially what we did. We built a completely new coding agent called AMP from the ground up. And I'll show off to you what it's able to do. And I think the best way to talk about how we've been building with the tool calling models that Anthropics have been shipping in conjunction with the model context protocol is to show you AMP in action and show you it using a bunch of tools to complete tasks.

So this demo, I think, is going to be-- I've been watching all the talks. And I think this might be the longest live demo of the day. So all the prayers that you said for Brad and other folks, please say for me this is a live AI demo. What we're going to do is we're going to make a live change to AMP itself.

So I have described the change I want in this linear issue. It's simple because this is a demo. I want to make it easy to grok. And basically, all we're going to do is we're going to change the background panel of AMP to the color red. I've described it in that linear issue.

I've given some instructions just as I would in an actual linear issue that I handed off to an engineering member of my team. And I'm just going to have it implement the issue. So implement. I'll just paste in the URL of the issue. And the first thing it does is it actually uses a linear tool that it has provided through an MCP server to fetch the contents of that linear issue.

So the issue that we're just looking at, now the model has access to. I didn't have to add mention it or prompt it in any special way. It just knew to use that tool to fetch that piece of context. And it's going to do a couple more agentic steps here to find the appropriate context within the code base.

But I just want to point out that this linear tool, it's not a first party tool. It is actually the official linear MCP server, which I currently think is one of the best MCP servers out there. The way we've implemented it is you just plug in the URL. We actually have this remote MCP proxy that I'll talk about a little bit later that kind of secures the connection and handles the secret exchange with linear or whatever upstream service you're talking to.

And that's how we're integrating this capability into our coding agent. So it's going to do a bunch more things. Going to search around. We'll just let it go for a bit. In the meantime, I thought what we could do is talk a little bit more about the AMP architecture and how MCP plays into that.

And I could just share some slides about how that architecture looks. But I thought, maybe let's just use AMP to tell you about the AMP architecture. So here, I'm going to open up the AMP CLI. This has access to all the same tools. It integrates MCP servers in the same way as the editor integration that you just saw.

And let me just ask it. What are the main architectural components of AMP? How does MCP play into this? So we're going to let that run. And then while that's running, I actually just became aware of this phenomenon. I don't know if any of you are aware that there's this thing people do now where they watch two unrelated YouTube videos side by side, it's like a Gen Z thing.

Maybe sometimes it's like a TikTok, sometimes it's like three videos. I thought, what is the coding equivalent of that that agents unlock? And so I was thinking, OK, instead of playing a game or watching a video, why don't we just make a game on the side while we're figuring out all that other stuff out?

So let's use MCP again and say, find the linear issue about 3D Flappy Bird. I wrote a very detailed spec ahead of time. And we're going to try to vibe code 3D Flappy Bird on the side while all this happens. So over here, we got a pretty detailed textual explanation of how MCP is integrated into AMP.

But a picture is worth many more words than just text. So why don't we ask it, can you draw a diagram showing me how these components connect and communicate? And we'll have it do that. In the meantime, let's check back over here to see how our agent is doing.

It's making some changes to the code still. One of the things I want to point out here is that another tool that it's integrating-- so it's using a lot of tool calls along the way. Each one of these long text things is a tool call. It's interacting with the browser.

It actually uses this other MCP server, the Playwright MCP server, to interact with the browser and take screenshots. And it's going to use that as part of its feedback loop to verify that it actually made the change that we're telling it to make. So let me just reload the window here to see if it's done yet.

Nope, still working on that. Let me actually just restart this. Sometimes it gets confused. But the beauty of agents is if it messes up, it's not-- most of the time it's not worth it diving in to see where it kind of screwed up. I just ask it to rerun, and most of the time it just works.

OK. Over here, let's take a look at the architectural diagram that it generated and see how MCP relates to these components. If this were in the editor, it would just show this. And the text is a little bit hard to read. But as you can see, it's got the core pieces of the AMP architecture here.

Everything is routed through this kind of like server-s thread component, which talks to the MCP integration, and which in turn talks to all the services that we're integrating through the model context protocol. So that's pretty cool, right? Streamline onboarding makes it really easy to grok what's happening in a large-scale code base.

OK. Now let's see if 3D Flappy Bird is done here. So it looks like it got as far as running a Python web server. So let's go over and copy this port. All right, so we have 3D Flappy Bird. That's pretty cool. Successfully wrote an app on the side.

And then let's finally check back in on the first thing, which is did it turn the thing red? And there we go. So while I was explaining to you how AMP works, AMP used tools and MCP servers to basically make a change in itself, explain its architecture, and code up a minigame on the side.

So yeah. Oh, and also it marked the linear issue as done, which is cool. Use that MCP server again. So that's just kind of a brief illustration of the power of MCP and the sort of results we're getting in building this coding agent. Can we go back to the slides?

And we've really incorporated MCP in a very deep way. So one of the things about AMP's architecture is if you look at the different components, we have an AMP client, and there's an AMP server, and then there's these external services and local tools that we want to talk to.

It turns out the way to effectively connect to all these different types of tools and services, it can be done through MCP. So when we're talking to local tools like Playwright or Postgres, that's going over MCP or standard I/O. When we're talking to external services, whether it be first party services like Sourcegraph or Code Search Engine, which is really good at searching over large-scale code bases, or other services such as your issue tracker or your observability tool, that talks MCP as well.

And part of the work that we've done, which I'll touch upon in a little bit, is also connect these MCP connections through a way that securely handles secrets and forwards the identity of the user to the appropriate external services. One of the things that we realized in building this is that there is kind of like a new emerging recipe for AI applications or AI agents.

So just like RagChat was kind of the model for the previous era, I think this is the rough formula for the new era. And we actually wrote a blog post about how this is not like arcane magic. Really, anyone can write an agent probably in the time it takes to listen to this talk.

And so if any of you want to try your own hand at writing a simple coding agent, just go to that blog post, and it shows you how. But the recipe we found is you need maybe like four things. One is you need a really strong tool to use LLM, which the latest Cloud model provides.

We're really excited about Cloud 4's capabilities. We've been playing around with it for the past couple of weeks. All the stuff that I just demoed was running off of Cloud 4. In conjunction with that tool calling model, you need a way to provide a bunch of tools. And MCP just so happens to be the perfect solution for that.

So tool use LLM, MCP. And then when it comes down to thinking about the actual user experience, what we found is really important is to really focus on the feedback loops. So as you saw with the AMP agent as it was making a change to itself, what it was doing was it was using the Playwright MCP server to take screenshots of changes it was making to the app along the way and using those screenshots to validate whether it was doing the thing it was doing.

And that change was actually pretty non-trivial, because the component hierarchy has a lot of containers. And sometimes the change looks right, but doesn't actually change the colors in the actual application. And so that feedback cycle is really essential to making agents work in practice. And part of this, too, is if you design the feedback loops properly, our thesis is that the UX becomes a lot more imperative.

So I think like the previous era of AI applications, there was a lot of UX Chrome and a lot of UI built around figuring out how to invoke the chat-based models sort of in situ in the right situation in the application, a lot of manual context selection. Tool calling LLMs plus tool use, it's a really powerful paradigm.

And oftentimes, the best interaction we've found is just ask the agent to do it and then refine the feedback loops so that it's able to get those things done reliably. In terms of tool usage in AMP, some of our top tools are provided through MCP servers. So our most popular tools are probably the ones I listed above.

There's some local ones like Playwright and Postgres. There's also a great tool to integrate web search. So you can either use Anthropics Web Search API. There's also Brave Web Search, which is really nice. Context 7 is a popular MCP server that pulls in different documentation corpuses and, of course, Linear, which I just showed, which allows you to do this kind of issue to PR workflow.

And then finally, I want to call out Sentry as being just a really strong MCP server. They really focused in on the quality of the description of the tools. And that ends up being really essential to making MCP servers work well in practice. And that actually leads me to one of the pitfalls that we found in integrating MCP servers into our agent, which is one of the traps that we see some people fall into is what I like to call toolmageddon.

So it's this practice of, like, you know, MCP, MCP, MCP. Everyone's excited about MCP right now. So you just want to go plug in, like, two dozen MCP servers, each of which provides, like, a dozen tools. And that sometimes, like, when you think about how it's implemented underneath the hood, each one of those tool descriptions gets shoved into the context window and can confuse the model.

And the model is always getting better. But the more irrelevant stuff that goes into context, the less intelligent it is about making selection among those tools. And it is about sort of like general reasoning and getting the job done. And so in terms of how we baked MCP into our application architecture, in certain cases, we actually limited the set of tools that a particular MCP server provides to a smaller subset that we think are really essential to the workflows that we want to enable.

And roughly speaking, there's kind of three buckets of tools we find really useful. There's the ones that are devoted to finding relevant context. There's the ones that can provide high quality feedback, such as, like, invoking unit tests or invoking the compiler. And then finally, the ones that are involved in submitting done or declaring success, like marking the issue as done, or pinging the user to say, hey, I'm ready for your feedback on this.

I also want to talk a little bit about securing MCP. So that's a high priority for us, given how many of our customers are in kind of like large scale production code bases. So the original MCP spec didn't have anything around auth. They've since integrated OAuth 2 as the kind of designated authentication protocol, which I think was a really smart decision.

It's what we've used in the past to integrate a lot of external services. But that's just the protocol, right? There's still the implementation. And you still have to worry about, like, where you store the secrets. And I think a lot of what you see in the wild in terms of how people are integrating MCP servers, I think the vast majority of tools out there are still using largely MCP or standard I/O.

Even with the existence of remote MCP servers, someone made this NPM plugin that just converts a remote MCP to a local MCP. So the application feels like it's still talking over standard I/O. And it handles the auth handshake, but as a consequence, it just shoves the secrets, like your secret tokens to your other services in some random plain text directory.

And that's kind of like a no-go for a lot of our customers. And so as part of this handshake, we actually implemented a secure secret store where the AMP server takes care of the OAuth handshake, and it proxies the MCP connection from the client to these external services and ensures that no secret ever gets stored unencrypted on your local machine.

OK. In the last couple of minutes, I just want to take some time and speculate about where the future is headed. So there's been a lot of great talk at this conference about what's next after tool calling. Tool calling is such a powerful paradigm. One of the things that we think a lot about is the extent to which sub-agents can be the way that a lot of tools are implemented.

So I didn't point this out earlier, but the way that AMP was actually gathering context about the code base was actually a sub-agent. It wasn't just a deterministic search tool. It was actually its own mini-agent that's going in and invoking different low-level search tools and iterating on itself, reasoning about the context it gathers, refining queries to gather the appropriate context.

And we found that approach works just super, super well. And I think the potential for our sub-agents to become really good tools, we're only scratching the surface right now. There's also this notion of what does it mean to dynamically synthesize a tool? And I think we touched upon this in some of the earlier talks when we were talking about code execution.

So right now, the tool calling paradigm is largely, you have a static list of tools, and the model will go and invoke each one, one by one, look at the output, and then decide what to do next. And a lot of people, not just us, have pointed out, hey, it might be useful if we incorporated a notion of output schema into MCP, which that just got merged in.

And then the model can sort of like plan out how to invoke these tools and compose them and chain them in different ways. And if you squint, at some point, you're basically programming, right? Like you have these functions, you have these tools, you're composing them, you're combining them in interesting ways.

And so I think it's a really good time to revisit code interpreters, which was a thing that, you know, at first was a thing in 2023, but sort of went away for a little bit. I think with the advent of tool calling agents, there's a lot more potential to explore there and something that we're actively considering as well.

I was talking with David from Anthropic earlier this week. He stopped by our office, and we're talking about a lot of the parallels between tool calling LLMs and agents and the kind of discourse that was active when high-level programming languages first became a thing. So before, you know, programming languages settled on, you know, the abstractions that kind of dominate today, there's a lot of discussion around, like, what's the proper abstraction for a subroutine?

Is it a function? Is it message passing? Do-- is there a way to manage concurrent communication effectively? There's different models for that. And I think we're just going to revisit all of that now, because the analogies to, you know, programming languages now with, you know, agents and sub-agents and how they interact with each other and also with deterministic systems, it's just very strong.

And then the last point here is I think the way that most people integrate MCP right now, if you look at the part of the spec that the vast majority of MCP clients implement versus the protocol itself, the entire protocol, I think the protocol designers here were very, very forward-thinking.

There is a lot baked into the protocol like stateful session management, two-way communication, and sampling, which is when the MCP server calls the client to do LLM inference. That's frankly just not being used. Like, the vast majority of MCP connections right now are stateless tool calls that don't even take advantage of streaming.

And so this is such early days. Like, we're literally scratching the tip of the iceberg for what we can do with tool calling LLMs and tools provided through the model context protocol. And I don't know. The next year, I think it's going to get really weird. So I guess, you know, we're super excited to be here today.

It's been a great partnership with Anthropic for the past, let's say almost like three years now. I think we're one of the earliest adopters of Claude for coding purposes. I think we started using it in January 2023. It's been an amazing journey. And we're trying to build AMP from the ground up into this kind of like tool calling native coding agent.

And so if that's of interest of you, check us out. And we look forward to all the things that people kind of build on top of it, build with it, and also a lot of the MCP servers that people will build that we can then integrate. So if you're doing that, please get in touch with us as well.

We'd love to hear from you and figure out how this can fit into the kind of new way of doing software development that we're all just discovering together. With that, do I have time for questions or one question? Any questions? I'll kick us off. You have Sourcegraph Kodi as well.

So how are you thinking about Kodi and AMP playing together? Yeah, that's a great question. So we also have this AI coding assistant called Kodi that was one of the first RagChat context-aware coding assistants. Recall the earlier picture where it was kind of like the three eras of AI.

I think Kodi was close to-- Kodi was an awesome AI application built for the RagChat era of models. And I think there's still a lot of organizations that will find a lot of value from that paradigm and still a lot of workflows that can benefit from that. But because the underlying assumptions of what LLMs can do have changed so much, we think that the best user experience for the agentic world is going to come from an application that was designed from the ground up to take advantage of tool calling and MCP.

And that's why we built AMP as a separate application and thing from Kodi. And if I'm being a little bit snarky, I think if you're not rethinking that architecture, I think as an application developer, you're at risk of falling behind and missing the next wave of AI development. All right, I think that's my time.

Thank you so much. Thank you. Thank you. Thank you. Thank you.

MCP at Sourcegraph | Code w/ Claude

Chapters

Transcript