Back to Index

How to Secure Agents using OAuth — Jared Hanson (Keycard, Passport.js)


Transcript

- Thanks a lot, everyone. Thanks for coming out. We're gonna talk about a topic that I consider one of the most important topics for what we're doing with AI and agents, which is how to secure agents using OAuth. I'm Jared Hansen. I'm the co-founder of a new company called Keycard, where we're building identity and access management platform for AI and agents.

I'm also the creator of Passport.js for any of the Node developers in the audience, very popular OAuth framework. And previously I was at Auth0, where I built a lot of their core identity infrastructure and then at Okta. Let's get into it. So I think we're all super excited about what's happening with LLMs and AI-powered applications.

We can bring these things into our daily lives and they automate a lot of the tasks for us. And simply put, agents that are more connected are more useful. So let's connect these agents to more systems. But hold on a second, because today we face an impossible choice. We can give agents broad-based access and accept security risks, or we can limit their capabilities and sacrifice business value.

And this is exemplified pretty well in how we set up MCP servers today, which is we go get API keys that are typically long-lived and broadly scoped. We paste them into some configuration files and environment variables and let our agents run with them. Now, if we continue this pattern for hundreds or thousands of agents, we've got a pretty big security problem on our hand.

Luckily, we know how to fix this. We know how to transition away from static secrets to dynamic access using OAuth. Now, show of hands, how many people are familiar with OAuth in the crowd? I would say quite a bit. So I'll burn through this quickly. But just as a quick introduction, I'm not going to lie to anyone.

Like, OAuth is a relatively complicated protocol, especially when you consider all the extensions. But the principles behind it are fairly straightforward and easy to understand. What it is is a protocol for applications, which we call clients in OAuth, to request access to APIs, which we call resource servers. And these requests are mediated by what's known as an authorization server.

If you've ever used anything like Calendly and connected it to your Google Calendar API, you've experienced OAuth in the real world. What's happening there is Calendly sends a request over to Google saying, "Hey, I'd like access to this person's Google Calendar." Google's authorization server then ensures that you're logged in, prompts you for consent that you want this access to occur.

And if you agree to it, Google sends what's known as an access token over to Calendly. And then Calendly can take that access token and go about accessing your calendar. There's a few other interesting bits going on here, like refresh tokens, which basically allows these access tokens to be short-lived and rotated pretty quickly, while still maintaining the authorized connection.

And in OAuth, we call these types of flows that involve user delegation authorization code flows. And they typically happen via browser-based interfaces that you've seen when you've used these types of applications. Now, one thing that gets kind of confusing for people is that OAuth is oftentimes used to implement things like sign-in with Google or sign-in with Facebook.

And this is confusing because we refer to OAuth as an authorization protocol or a delegated authorization protocol specifically. So what's going on here when we use it for sign-in? Well, this is really just a special case where the API gets replaced with a user info API that just returns claims about the user who logged in.

So their ID, their name, their email address, et cetera. And we kind of use authorization to back our way into authentication. And this became like such a common pattern that people use with OAuth that it got formally standardized as OpenID Connect, which is just an identity layer on top of OAuth that standardizes the response format of that user info API.

It also does a couple of things that are kind of confusing, like introduce more terminology, which identity people are prone to do. We call the authorization server now an identity provider in the scope of OpenID Connect. And applications are known as relying parties. Don't get hung up on the terminology.

It's all the same thing. One other thing that OpenID Connect does is it introduces an ID token. This is simply a JSON web token, which is a cryptographically signed statement about who the user is. This overlaps a lot with the user info API. You can think of it as sort of an optimization that the application can verify itself without making API requests.

It also serves some functions in like ongoing session management between applications and authorization servers, but that's kind of beyond the scope of introductory material here. In the real world, these things get deployed together. We'll typically run authorization and authentication flows in line so that we know who the user is who logged in, as well as get access to things like their Google Calendar.

One thing to call out that is important here is that there's three roles in OAuth. The client and the resource server, I think, are all relatively straightforward. We understand that from client-server architectures. The client requests resources, and the resource server responds with the data. What gets different is that we introduce this authorization server in the middle that mediates this access, and it mediates it by issuing tokens.

It's just tokens back to the client, which holds them and then presents them to the resource server, and the resource server's job is to verify those tokens. Now, what's the benefit of this sort of model? The main benefit flows to the APIs. They don't have to care about anything to do with authentication anymore.

So, verifying user password or doing step-up authentication, running the consent flows. They hand all that job off to the authorization server, and it gets kind of abstracted away by the token that the API can verify what has happened. There's also some benefits that we can, like, centralize policy and then deploy ecosystems of apps and APIs, all kind of protected by a central location, and build out the ecosystems that we all know today.

How do we apply this to MCP and agents in particular? Well, it should be pretty simple. Now, our applications get replaced by a chatbot or agent like Claude that we want to connect to MCP servers. The MCP clients and the MCP servers should get authorized via OAuth by, you know, the controlling authorization server in the middle.

This should be pretty simple, right? Well, nothing with OAuth is ever so simple, so let's take a look at the state of authorization in MCP. We're going to look at where it started, where it is now, and then where it's going in the future. So, the first version of MCP, it's a pretty young protocol.

It's like seven months old to the day, I think. The first version I like to call the no-auth version. It didn't have any authorization in it at all, which they admitted in the spec. It was really a way to get something out there, primarily for local MCP servers. There was some notion of remote MCP servers, but, again, no authorization.

But this kind of spurred discussion. People saw the promise of MCP and started discussing how to add authorization to it. Now we have the latest draft of the specification, which was published in late March. I like to refer to this as OAuth the first attempt, and for anyone who has ever done OAuth implementations, the first attempt is always pretty poor, and that is the case with this version of the specification of MCP.

I don't actually recommend anyone read the authorization part of the MCP specification as it is today because you'll walk away with a pretty misinformed view of what OAuth is. But as a quick recap of what it does, it says, OK, MCP client's got to implement the client side of OAuth.

That'll make sense. And then it also says, MCP servers, you need to implement all of OAuth too, including authentication, token issuance, et cetera. Now, OAuth has three roles. Where's the third role here? What happened to the OAuth server? Well, it got collapsed into the MCP server, which is a bit odd.

And people started noticing this. So five days after the specification was released, a blog post went viral. This one from Christian Posta saying, "The MCP authorization spec is a mess for the enterprise." And he states, you know, "The problem here is that it treats the MCP server as both a resource server and authorization server." Aaron Parecki, who does a lot of great OAuth standards work, followed this up with another blog post that went viral titled, "Let's fix OAuth and MCP," where he noted that, you know, a bunch of the confusion that was happening was because the diagram showed that the MCP server itself is handling authorization.

Now, then this kind of culminated in a PR to the specification where people proposed, "Let's fix this problem. let's just shift the MCP server to be an OAuth resource server and everything will be good." This was a super interesting PR to read. There's like 400-some comments on it. It's not even the only PR there, but just kind of an example of how people just picked up on this problem and ran with it.

Now, I'm not usually one to say I told you so, but all the way back in January of this year, I commented on the, as a review for the specification, I was like, "Hey, I recommend we model MCP servers as resource servers from an OAuth perspective." I'm not quite sure where that got lost.

It didn't get picked up, but in any case, we fixed this problem, and one of the reasons I'm here is to tell us all more about OAuth things that we need to pay attention to in order to avoid this problem in the future. So, okay, the next attempt. In draft, all this feedback has been incorporated, and the MCP spec is kind of like fixing its issues, and the draft version of the specification models all of OAuth pretty cleanly and pretty nicely.

The OAuth authorization server is a totally separate entity, and this is really beneficial for all of you building MCP servers because your job gets a whole lot easier. All you have to do is verify the tokens that come in over HTTP and hand off all the other responsibility to the OAuth server.

So, we're back to a pretty good place with respect to OAuth and MCP, and in particular how we authorize connections between MCP clients and MCP servers. So, let's talk about the future. If this is all we do with OAuth, we're not even scratching the surface of what we need in order to fully secure AI and AI interactions.

So, what else are we going to need? We're going to burn through this here pretty quick. The first is agent-to-agent communication. So, what we've seen with OAuth so far as it's applied to MCP, like I said, that's referred to as the authorization code flow, and it's particularly relevant for when we want to do end-user delegation.

But there's a whole bunch of other flows in OAuth that are relevant, in particular client credentials, and this applies when we want agents to communicate with other agents or other MCP servers on their own behalf, not on behalf of a user. So, this is one thing to pay attention to.

The next, this kind of begs the question, agent identity. What should we do about this? Well, if anyone's ever done OAuth development, you're probably familiar with this type of flow, is you want to build an application, you want to integrate with an API, you go to some developer portal, create a new application, get a client ID in secret, and then somehow configure your application with those credentials.

So, this is a bunch of friction. This obviously won't apply well to MCP, which is trying to be a standard protocol, and you want to bring tools and agents together that may not be aware of each other. You can't do this if you presuppose some sort of registration process.

So, what does MCP do? Well, it picks up what is known as dynamic client registration. So, what this does is allows applications and agents to request credentials at runtime rather than, like, ahead of time in manual registration. So, an agent says, hey, like, this is who I am. Give me a client ID in secret.

The server does it, and the agent goes about the rest of its OAuth flow. Now, this specification has been around for about 10 years, and in practice has seen, like, no meaningful adoption, and one of the implications behind this is it, like, makes all agents anonymous, because the registration request itself is uncredentialed.

This makes it hard to build trust in agents. It's probably not super viable, in my opinion. So, what should we be looking at instead? Well, there's many cases where we just want to use public clients that we don't really care about verifying their identity. In this case, there's an emerging specification called pushed client registration, which introduces this kind of, like, well-known string to identify a, like, public client.

We can just use this well-known string, and we skip the whole registration song and dance and then the need to store the resulting state. So, this is, but a lot more simpler. It also has the capability to carry certain client metadata in the request, if that's necessary. So, this is something we should look in for cases where public clients apply.

But what about clients that we actually want to authenticate and verify their identity? Well, my proposal here is that we should start looking at using URLs in PKI for identity. This lets us reuse the existing identifiers that people already associate with the apps they're using, and can repurpose them into the agent world.

This looks like, in practice, we'd have a URL, such as, you know, agent.com, to be used as a client identity in OAuth flows. And then, through the magic of cryptography and key sets, we can authenticate these agents by having them sign jot assertions or HTTP message signatures that we can then verify with the corresponding public keys.

All right. This dovetails into agent attestation. We've connected our agents to the resources that we're using, but then that agent turns around and sends all that information up to an LLM. This seems like something we should probably have some awareness of and control over. So, in kind of protected environments, we can sort of get by, like treating the LLM as just another API, which often it is.

And this is a technique we could apply, but it has limited capabilities when we look at, like, edge deployed agents, such as on the desktop or mobile devices where we don't really control their software environment. So, there's a bunch of interesting work going on in the IET app now with respect to, like, remote attestation and supply chain security where we can start to attest to the state of the device and the software running on it and know what LLMs our data is going to wind up in, and then incorporate that into OAuth authorization flows.

Next up transactional authorization. What we've done to date in OAuth is introduce scopes. This is a whole lot better than passwords, which OAuth kind of replaced back in the day, in the sense that now we can do more fine-grained permissions, such as, like, read versus write access. But in practice, these end up being a little bit too coarse-grained for a lot of use cases, and oftentimes a little bit longer lived than we might like.

In agent interactions, we're going to have to be increasingly transactional. So, imagine use cases where you want agents to do financial transactions or commercial transactions. We're going to want to authorize things on a transaction basis, potentially with specific amounts or financial budgets. So, we're going to have to look at moving to more dynamic access in this respect.

There's a proposal that's actually like a specification at this point called rich authorization requests, which is worth looking into, and something that we can take inspiration from or either adopt directly for these use cases. Next up, we have chain of custody. This is particularly interesting to me. What we talk about with MCP really covers the first leg of this.

On the left-hand side, we have authorized connections between agents and MCP servers. But what happens on the right side is completely unspecified in terms of, like, the security profile. So, how do we protect an MCP server that calls another API within the same domain in particular? There's a technique called OAuth token exchange that I recommend everyone look into.

A special case of this is MCP servers to third-party APIs. In this case, we should look into identity chaining across domains and its corresponding specification, the identity assertion grant, which lets us do cross-domain authorization in the backend. Somewhat outside the scope of OAuth is other internal infrastructure that people should be aware of as they look to deploy these agents.

And then the culmination of this is really agent-to-agent flows, where I don't know how much of this is happening in practice today, but people see the promise of it. Imagine big graphs of agents talking to other agents on other servers. We're going to need end-to-end visibility as the authorization flows along these graphs.

Finally, async interaction. I think one of the key things to look at here is, like, OAuth typically assumes a user is sitting in front of a browser and relatively static. But as we kick off flows, users might walk away and agents do work in the background. They're going to need a way to reach out to the user and say, hey, I need a bit more access than I've been permissioned.

How do we think about bringing more real-time interactions via channels like SMS or push notifications rather than just browser-based flows? And then a hot topic. Today, there's a bunch of interesting work going on in the voice track at the conference. As AI starts to interact with us via voice and video or completely in the background, how do we think about security in those respects?

This is really the frontier of security and interaction, but there's a lot of prior art in various real-time communities around SIP, XMPP, WebRTC that I think is very interesting for us to all look at. So, there's a lot here. Let's go build this stuff. It's all important for us to achieve a safe and secure AI future.

This is what we're building at Keycard. We're building an identity access management platform that lets you connect your co-pilots, custom agents, and third-party agents to all your apps, services, and infrastructure, all using standards-compliant protocols, A-to-A, MCP, and OAuth. If building this stuff is interesting to you, we are hiring, hiring, so get in touch with me.

And if it's not interesting to you, but you know you want to secure your agents, get in touch with us, too. We're looking for partners that are building so that we can work with you to secure your agents. The website is keycard.ai, and I will be around the rest of the conference.

Thanks. We'll see you next time.