Building Agents (the hard parts!) - Rita Kozlov, Cloudflare

- - Hello, everyone. I'm Rita, I'm the VP of Product for Cloudflare's developer platform. So, workers and durable objects, thank you for the shout outs. I always like to start by talking a little bit about Cloudflare's mission and especially our mission for developers. And I saw a couple hands here in terms of a number of people that used Cloudflare workers before.

But actually, if you're sitting in this room, whether you've signed up for Cloudflare directly or not, you've 100% used Cloudflare before because about 20% of internet traffic flows through Cloudflare. So, if you've ordered an Uber recently, or maybe even ordered some food, you've absolutely used Cloudflare. But, aside from Cloudflare's CDN, DNS, DDoS services, we do also offer services to developers, including functions that you're able to run, storage, compute, AI inference, spanning many, many things.

And our vision for developers is to make it as easy as possible for someone to bring their idea to life from the moment that they write their first line of code to deploying it to production, to making it live for the first user, to the millions that come after that.

So, that's what I do. It makes my job really exciting to wake up in the morning and see what developers are going to build. Now, if you're in this room, I don't need to tell you that AI is a big technological paradigm shift as Cloud, mobile, or social before it.

I think everyone here is already convinced of that. But it is interesting to see just how quickly things are moving because I think that it's a good reflection of how quickly things are about to move next. So, I realized that I gave a talk about a year ago where I was pulling up some stats and looking at where we were at.

And so, a year ago, about 44% of developers were using AI as a part of their day-to-day to help them write code. And Gartner was predicting that by 2030, about 50% of knowledge workers would be using AI to augment their work. And these numbers seem really, really low now, right?

Like, today, over 75% of knowledge workers use AI to augment their work. So, this is already surpassing the 2030 estimates that were given. And more than 76% of developers use AI as a part of their development process. And I think that, honestly, from the time that this report was pulled to now, that number has grown even more.

The other interesting thing was that about a year ago, when we were talking about workloads, we were primarily talking about workloads in AI that involve training. And we predicted then that workloads were going to shift towards inference. And again, we've been seeing that unfold. So, we saw that with OpenAI's O1 model, which is shifting more and more from training to post-training and inference.

We saw a similar thing, actually, with DeepSeek, who optimized training so much that more and more energy is spent on the inference part of it. But let's talk about what's next. So, after training and inference comes, I think, actual automation. And I know there's been a lot of talk about agents the past couple days, but this is the reason that this is so exciting, is that we have the opportunity to not just augment people's work, right?

You've been able, for some time now, to go somewhere like ChatGPT and ask it, like, "Hey, help me draft up an email." But what's really, really powerful is to be able to go and say, "Hey, I have a campaign I want to run. Grab me a full list of the customers that I talked to this week at the conference.

then draft me up the email, then actually, I do want to review it before it goes to a customer, so do send it to me for approval, and then ping me when the customer responds." And so, these are exactly the types of agentic workflows that I think we're going to see more and more, that are really going to unlock that next level of productivity.

And we're already starting to see these agents out in the wild and really meaningfully impacting businesses. So, some businesses are seeing 20% revenue increases already as a part of starting to adopt agents as a part of sales automation. Some businesses are seeing 90% faster response times to support when using AI agents.

And in general, people are seeing about 50% to 75% time savings when using agents. So, agents are going to be even more meaningful, but are already reshaping the way that we work. Okay, but you want to build an agent, where do you start? What all goes into building an agent?

The way that I like to think about agents really comes down to these four components. So, first you have the client, you have the interface that the agent is going to be interacted through with a human, right? Then you have the AI, the reasoning piece, the thinking part that's going to come up with the logic of what are we about to execute, what are we going to do next?

Now, the thinking part needs now, it's executive branch, right? It needs a way to go and execute on the actions that it decided that it was going to take. And then, so that's the workflows. And then workflows also need access to tools. So, it's not just enough to be like, okay, I'm going to go and do this.

They need access to the tools to actually take the actions. So, let's run through a quick example of what would it look like, that CRM agent that I was just showing, if I were to go and build something that helped me contact people that I talked to, what would that look like?

So, the first part is, if I wanted to have something that works over voice, where I can be like, hey, do this for me, you need something that connects over WebRTC, you then need a speech-to-text model to translate what you said back into text. Alternatively, we're all familiar with chat UIs, right, so you need somewhere to host that.

Then, ideally, you're using some sort of gateway to do caching and to run your evals to make sure that, as you're iterating on the overall process, that things are getting better and better. And then you need to send that response to an LLM that's going to do the thinking part and come up with the rest of the plan.

From there, you need a workflow agent. So, that's what's going to keep track of what actions have been executed and what actions need to take place next. And then, again, you need to connect to your tools. It can be a web browser, it can be an API, it can be an internal service that you need to connect to, or it can be a vector database if you need to grab additional knowledge that that agent needs access to.

Sometimes, you're also going to need a human in the loop to verify some of these actions that you're taking. So, how do you build an agent? I'm actually going to go backwards here and start with the tools part. And most recently, there's been a lot of talk about MCP.

So, the amazing thing is that Anthropic introduced this new standard back in November. And I think the really interesting thing about it is that it really got people thinking about, okay, how do we expose APIs to LLMs in a way that allows us humans to talk to LLMs over natural language?

But I think the real missed headlines of MCPs was actually that LLMs became really, really good at tool calling. This wasn't so much the case a few years ago if you tried to play around with tool calling. But now they are. And so, we have this new standard for how you can actually write out your code in a way that's going to be incredibly easy to consume by any MCP client.

And so, the, again, really cool thing about MCP is that it does respect a traditional client server architecture where you're able to have that conversation back and forth. And importantly, have more than one client that connects to the MCP server. So, these are some of the core concepts that go into MCP.

MCP servers generally have resources, prompts, tooling, and sampling. Resources can be anything from file contents and database records. Prompts actually help you define how you want someone else to interact with your agent because you can actually prompt your agent probably better than anyone else can. If there are any nuances about how your system works, you want to build that into it as much as possible.

Then you want to give it access to the actual tooling, right, and connect those queries with the tools. And then last but not least, sampling. I actually think I haven't seen anyone using sampling in production yet in an MCP server was the interesting conclusion that I came to as I was preparing this talk.

But the idea is to actually allow you to kind of use shorthand with your LLM and allow it to kind of complete some of the thinking behind it. So, but building MCP does come with some tricky parts. And I think the trickiest parts of that is, first of all, the transport protocol over SSC and WebSockets, the OAuth part, and the memory part.

But I'm going to share a cheat code with everyone here. So, get ready. I'm going to like flash it real quick. Oh, you missed it. No, I'm just kidding. So, Cloudflare has an SDK called Agents that you can install that will actually give you a lot of this functionality out of the box.

So, we released Agents SDK a few months ago. And yes, it has the same name as the one that OpenAI just released a few days ago as well. But, and the two actually work, play with each other really, really well. But I'll tell you a little bit about what it does.

And you can, so you can use Agents SDK, first of all, to run MCP servers. And it comes with a class built-in called MCP Agents that allows you to host your remote MCP servers with OAuth, with transport, with HTTP streaming, all built-in. So, if you're one of those people that never wants to touch OAuth again, this allows you to do that.

The really cool thing is that it has state management built into it because Cloudflare has this primitive called durable objects. And so, durable objects, the idea is basically, it's kind of like a serverless function, but with state attached directly to it. So, if you've ever wanted to write some code, but then save the state of it without ever having to set up a database or anything like that, this is a really, really great way to do it and makes it really easy to build these MCP servers.

It comes with real-time WebSocket communication, so that makes the whole chat interface thing really, really easy. React integration hooks, so you can build, you can integrate it into your front-end really easily, and basic chat capabilities. So, let's walk through what it would actually look like to deploy an MCP server on Cloudflare.

So, first, I can define my MCP class that extends MCPAgent, which I was just talking about. And this MCP server is going to be kind of like a Goodreads server that's going to recommend different books to us. So, we're going to set an initial state that's empty, but then I can give it a tool that's called Add Genre.

So, I can start to specify my preferences. I'm a big Patricia Highsmith fan, so I can say, you know, I really like thrillers. And it's going to, it's going to save it and persist it for future interactions. And so, when I then ask it for, I can then have a separate tool called Get Recommendations that's going to get book recommendations.

And you can have, so we were talking about MCP prompts before. You can have a personalized prompt for recommending books to someone who likes the genres, right? And has read the books that you've previously specified that you read. And so, it's a really good way to get these personalized recommendations.

And every time that you interact with this tool, it's going to persist the memory over every single time. So, the recommendations are going to keep getting better and better. And because this MCP server is standalone and can be interacted with through various, through various clients, the memory is actually going to persist regardless of the tool that you're using to call into it.

Now, why is this great? It's amazing because traditionally you would have to separately set up a database, manage connections, handle scaling. There would be added latency in the setup. Versus with MCP agent, because the memory part is built into it, you don't have to do any of that. And it's going to scale automatically.

It's going to run close to your AI agent. And you don't really need to think about infrastructure at all. You just get all of that out of the box. You can actually, so we have a blog post up. You can go and deploy your first MCP server today. It's really, really easy.

There is literally a deploy to Cloudflare button. It takes less than a minute to get your initial MCP server up and running. And what's been really cool is working with some of the brands that we respect so, so much and seeing companies like Atlassian, Asana, Stripe, Intercom building their own MCP servers in this exact way.

So you're actually going down a really, really well trodden path here. Okay. So that was the tools part. So let's keep working backwards from there. So we're giving our agents access to tools, but now we need a coordination component, right? A workflow that's going to maintain a state not through just that one tool interaction, but through the entire chain with perhaps a human in the loop.

So human in the loop workflows require you to have really long running tasks that sometimes need to talk to an LLM. It might be a reasoning LLM that takes several minutes to come up with a response. And similarly, if you're talking to a human in the loop, a human could take minutes, hours, days, months to respond.

And so you need something that's going to be able to come back and resume its flow after that task is completed. You also still need to consider things like WebSocket servers, stay persistent, retries, horizontal scaling. These things can get quite tricky. So again, let's walk through a real use case that we built out with a customer.

There's a company called Knock. They do notification management. And they needed to provision an agent that would do approval when you could request a new credit card, right? And then, you know, your boss needs to go and approve it through, you know, it can be an email, Slack, in-app notification.

So what do we need to do in order to do that? First, we need to allow users to request a new card through a chat interface. So you can see that here we're importing useAgent from the agent's React library. And then we're going to create a new instance of chat that's going to have all of these things instantiated on our behalf.

And this is all part of agent's SDK. Then we need to give it an ability to issue cards through this issue card action. But we need to wrap it in the require human input tool in order to delegate that piece to Knock. So we want to make sure that the issue card tool is always, always requires the human input.

Then we need to invite Knock to send our approval notifications and defer the tool call to issue the card until there is approval. Right? So we have a tool call to get a new card provision. But we want to stall that on the actual approval. So you can see that in here where we're going to route the messages to approve something.

Now, once something is approved, we need to then route it back to the appropriate agent. And this is going to automatically be handled by the durable object and instantly routed to the correct agent back. So you can see in here that I'm going to find the user ID from the tool call for the calling user.

And then I'm going to be able to look it up so I can get the agent by name by the user ID in here. And so then if it's an existing agent, we're going to route it to the correct durable object and make sure that we're handling it with a correct webhook.

We then need to resume the pause tool call, issue the card, and let the user know that the card was approved. Right? So in here, if we received an approved status, then we can move on with the deferred tool execution that we defined earlier. And then last but not least, we need to make sure the duplicate actions don't occur, right?

So if two things happen out of sync, we can't approve the card twice or we can't provision the card twice. And so this is where, again, that state management becomes really, really important. And we're able to store all of this directly in the state here. So you can see if, you know, if the card has been requested or processed already.

And then if it's been approved, we're going to set the status. So when a new webhook comes in, we can't reapprove the same exact one. So we talked about tools, we talked about workflows. Next, you need the reasoning piece of this and need to choose the right model to run this.

I'm actually going to skip this part because there's an entire conference that's dedicated to this today of people that are going to cover this way better than I will. Actually, Logan's talk this morning about everything that's happening with Gemini was really, really good. There's a bunch of people talking about evals.

But then you still need a client in order to connect to your server, right? And again, this is the really beautiful thing about MCP is that once you built out your MCP server once, you can truly meet your users where they are. And realistically, the nice thing is you actually don't have to build a UI yourself at all.

If your users are developers, most likely they're already using cursor. And so now that cursor supports remote MCP servers, you just import your MCP server and have your clients be able to interact with it. Similarly, Claude and ChatGPT, they both support remote MCPs. So your users, again, can start using your agents instantly directly through there.

But you can also build your own app and your own MCP client. And I think this is where you can build really, really interesting agentic workflows when you do have more control over both the client and the server and connecting these two pieces together. And not only that, but your app doesn't actually have to be limited to just being a user interface.

You can also talk to your MCP client over voice, especially with some of the CloudFlare tools that we have built out that help translate WebRTC to WebSocket in a way that really makes it easy to build out these applications because the MCP client can easily understand those connections. So yeah, how do you build an agent?

These are the four different pieces you need, your client, your AI, your workflows, your tools. And if you want to get started and don't know where to start, I really, really highly recommend the agents SDK. You'll be able to get up and running in just a few minutes. Yeah, so thank you.

Thank you. Thank you. you We'll see you next time.

Building Agents (the hard parts!) - Rita Kozlov, Cloudflare

Transcript