Building Agents at Cloud Scale

00:00:00.000 | Hi everyone, I'm thrilled to be back on stage here again at the AI Engineer World's Fair,

00:00:29.040 | and it's amazing to see this community grow. So today, I'm going to speak about how we can build agents at CloudScale.

00:00:39.040 | Now, at Amazon and AWS, we truly believe that virtually every customer experience we know of will be reinvented with AI.

00:00:51.040 | And not just the existing experiences, but there will also be brand new experiences we are now able to build

00:00:58.080 | the help of AI agents. And we're not just theorizing about this, right? We're all here together to actually build the future.

00:01:10.080 | Now, I want to start just with a little bit of what that means internally across Amazon as a business. At Amazon, we have over 1,000

00:01:22.080 | and we're talking about generative AI applications that are either built or in development transforming everything from how we forecast inventory to how we optimize delivery routes to how customers shop and how they interact with their homes.

00:01:40.080 | And one of the most ambitious deployments of AI agents is the complete reimagining of Alexa. And I know many of us have been waiting for this for a long time.

00:01:52.080 | So what you're about to see here represents the largest integration of services, agentic capabilities, and LLMs that we know of anywhere. So let's have a brief look.

00:02:06.080 | Oh, hey there. So we can just like talk now. I'm all ears, figuratively speaking. Do you know how to manage my kids' schedules?

00:02:24.080 | I noticed a birthday party conflicts with picking up grandma at the airport. Want me to book her a ride?

00:02:29.080 | Billie Eilish is in town soon. No way.

00:02:32.080 | I can share when tickets are available in your city. Yes, please.

00:02:35.080 | Got any spring break ideas? Somewhere not too far. Only if there's a beach. And nice weather.

00:02:40.080 | Santa Barbara is great for everyone. I found a restaurant downtown I think you'd like.

00:02:45.080 | What is Santa Barbara known for? It has great upscale shops and oceanfront dining. Can you go whale watching?

00:02:51.080 | Absolutely. Want me to book a catamaran tour? What's the next step? Remove the nut holding the cartridge.

00:02:57.080 | Should I get things? You might only love them for a little while. You're probably right.

00:03:01.080 | Make a slideshow of baby T-neck. Mom. What part am I looking for again?

00:03:06.080 | Two-inch washers. Your Uber is two minutes away. For real? Wait, did someone let the dog out today?

00:03:12.080 | I checked the cameras. And yes, in fact, Mozart was just out.

00:03:16.080 | Wow. Wow. Look at my style. I know you ain't seen it like this in a while.

00:03:21.080 | Wow. Wow.

00:03:23.080 | I love sharing this video because it shows really the power of agents at scale.

00:03:33.080 | And just to have a quick look what that means in terms of numbers.

00:03:38.080 | We have over 600 million Alexa devices now out in the world.

00:03:44.080 | And with the help of the latest advancements in AI, we were able to really reimagine this experience.

00:03:51.080 | Alexa Plus works through hundreds of specialized expert systems.

00:03:57.080 | That's what the Alexa team calls groups of capabilities, APIs, and instructions to accomplish a specific task for you.

00:04:06.080 | And all of these experts also orchestrate across tens of thousands of partner services and devices to get the things done.

00:04:17.080 | Which you just see in a glimpse of this here in this video.

00:04:21.080 | And we truly believe that the future will be full of those specialized agents.

00:04:26.080 | Each with their own unique capabilities and working together seamlessly with other AI agents.

00:04:33.080 | Now, this example shows what's possible at this massive scale.

00:04:40.080 | But how do we get there?

00:04:42.080 | How do we operate at this scale?

00:04:45.080 | Or said differently, how do we move from web services that we've built for many years now into developing those agentic services?

00:04:55.080 | And luckily, many of the underlying principles remain the same.

00:04:59.080 | Whether you're building for millions of devices.

00:05:02.080 | Whether you're reimagining and integrating AI experiences into your enterprise applications.

00:05:08.080 | Or you're a startup and you're really just looking to kind of scale your idea to the next level.

00:05:13.080 | Now, another example I want to show you is an agentic service that we built at AWS.

00:05:22.080 | You might have heard about Amazon Q Developer, which is our code assistant that helps you really kind of across the software development lifecycle.

00:05:32.080 | And just a few months ago, we released an queue developer agent for your CLI.

00:05:39.080 | So it brings the agentic chat experience into the terminal.

00:05:42.080 | It helps you to debug issues.

00:05:44.080 | You can ask natural questions.

00:05:46.080 | It can read and write files.

00:05:47.080 | And really kind of help to make your day-to-day in the terminal more productive.

00:05:51.080 | So let's have a quick look how this looks.

00:05:54.080 | Here is Amazon Q in the CLI.

00:05:58.080 | And I'll just ask a good question here.

00:06:00.080 | In this case, hey, what do you know about Amazon Bedrock?

00:06:03.080 | CLI is integrated with MCP.

00:06:06.080 | So what it does, it actually figures out there is a tool.

00:06:09.080 | Our AWS documentation team has released an MCP server.

00:06:14.080 | It's connecting to it.

00:06:15.080 | You see the tool is happening.

00:06:17.080 | And it's asking for permissions.

00:06:18.080 | So I give it the permissions.

00:06:19.080 | And then it comes back with a response that is grounded in the official AWS documentation.

00:06:26.080 | Now, I don't want to talk much more about queue, but I do want to ask for you just to quickly think about how long did it take for the AWS internal teams to build and ship this agentic service.

00:06:42.080 | And let's just do it with a quick raise of hands.

00:06:44.080 | Who think it took two months to develop and ship this?

00:06:47.080 | A few hands.

00:06:49.080 | Who thinks three weeks?

00:06:51.080 | All right.

00:06:52.080 | It's a bunch of more hands.

00:06:54.080 | Who do you think it took half a year?

00:06:56.080 | Almost none.

00:06:58.080 | Wow.

00:06:59.080 | You folks are great.

00:07:00.080 | We built and ship this within three weeks.

00:07:04.080 | And to me, this is just almost insane, right?

00:07:08.080 | Like the speed.

00:07:09.080 | And we heard it earlier, like the mode of AI, one of the keynote speakers call it out, is execution, right?

00:07:16.080 | And I think three weeks is super impressive.

00:07:19.080 | Now, how do we enable teams, and not just internally at AWS, but in general, to build and ship production-ready AI agents this quickly?

00:07:31.080 | What we did internally, our teams, we needed to fundamentally rethink how to build agents.

00:07:38.080 | And what we did is we developed a model-driven approach that really kind of taps into the power of LLMs these days and models that are so much more capable in deciding, planning, reasoning, taking actions.

00:07:51.080 | And let the developers focus on what their agents should do, rather than telling it exactly how to do it.

00:08:00.080 | And the great news is, we made it available for all of you to use as well.

00:08:06.080 | So just a few weeks ago, we released Strands Agents.

00:08:11.080 | It's an open source Python SDK, which you can check out and start building and running AI agents in just a few lines of code.

00:08:21.080 | So let me show you quickly how this looks like.

00:08:24.080 | And before I go in here, just a fun fact, if you wonder, why did they call it Strands Agents?

00:08:31.080 | Well, this is what happens if you let AI pick its own name.

00:08:35.080 | All right.

00:08:37.080 | So the reasoning behind, because, again, the AI agent is capable of reasoning,

00:08:42.080 | it came up with, like, think about the two strands of DNA.

00:08:46.080 | And just like the two strands of DNA, Strands Agents connects the two core pieces of an agent together, the model and the tools.

00:08:57.080 | And it helps you building agents.

00:08:59.080 | It simplifies it by really relying on those state-of-the-art models to reason, to plan, and take action.

00:09:07.080 | You can simply start with defining a prompt and your tools in code, and then test it out locally.

00:09:14.080 | And then once you're ready, deploy it, for example, in the cloud.

00:09:18.080 | And this is how simple it is.

00:09:20.080 | Again, just a couple of lines should look pretty familiar.

00:09:23.080 | You install Strands Agents, you import it, and then it comes with pre-built tools, which I talk about a little bit more in detail.

00:09:31.080 | And basically, you just add the tools to your agent, and then you can start asking questions or building more complex workflows with it.

00:09:39.080 | Now, by default, Strands Agents integrates with Amazon Bedrock as the model provider.

00:09:46.080 | So you can check the model config here.

00:09:48.080 | It's using Cloud 3.7 Sonnet.

00:09:51.080 | But, of course, it's not just limited to AWS.

00:09:55.080 | You can use Strands Agents across multiple providers.

00:09:59.080 | For example, we have integrations with the Llama, so you can start developing locally, testing it out.

00:10:05.080 | We have integrations, entropic-edit integrations, meta-edit integrations to the Llama API.

00:10:11.080 | You can use OpenAI models and any other providers available through the integration with Lite LLM.

00:10:16.080 | And, of course, you can also develop your own custom model provider.

00:10:20.080 | Now, quickly on the tools.

00:10:25.080 | As I said, Strands Agents comes with over 20 pre-built tools.

00:10:29.080 | So anything from simple tasks like, hey, I just want to do some file manipulation, some API calls,

00:10:35.080 | obviously integrate with AWS services, but then also more complex use cases.

00:10:41.080 | And I just want to call out a couple of them.

00:10:44.080 | So there's a whole group of integrated tools from memory and RAC.

00:10:48.080 | One tool specifically called Retrieve, which lets you do a semantic search over a knowledge base.

00:10:56.080 | And just to show you the power of this, we have an internal agent at AWS that manages over 6,000 tools.

00:11:05.080 | Now, 6,000 is a hard number of tools to put into a single context window and give one model to decide.

00:11:12.080 | So what we did is we put the descriptions of those tools in a knowledge base and use the retrieve tool here.

00:11:19.080 | So the agent can find the most relevant tools for the task at hand and only pull those tools back into the model context for the model to decide which one to take.

00:11:29.080 | So that's just one use case how we're leveraging that.

00:11:33.080 | Also, there is support for multi-modality across images, video and audio with strands.

00:11:41.080 | There is a tool to kind of prompt for more thinking and deep reasoning.

00:11:47.080 | And it also comes with pre-built tools to implement multi-agent workflows, whether it's graph-based workflows or a swarm of sub-agents working together.

00:12:00.080 | Now, you cannot talk about tools without mentioning MCP, right?

00:12:05.080 | So obviously, we integrated MCP here natively within strands.

00:12:09.080 | So you can just use this also to connect to thousands of available MCP servers and make them available as tools for your agent.

00:12:18.080 | Support for A2A is also coming soon.

00:12:21.080 | But let's start and talk a little bit about MCP first.

00:12:26.080 | If you're building on AWS already, make sure to bookmark this GitHub repo.

00:12:31.080 | It's .slab/mcp.

00:12:33.080 | And here you can find a very long list, much longer than you would see here on this slide,

00:12:38.080 | of a growing number of the MCP server implementation, specifically if you're working and building on AWS.

00:12:45.080 | Now, one of the challenges stems from the fact that once we all started building MCP servers,

00:12:54.080 | what we had was standard IO, right?

00:12:57.080 | So it started out to help locally connect your systems, your clients, to respective tools.

00:13:04.080 | And here's just a quick example, which is important for a demo I'll show in a little bit.

00:13:09.080 | This is just a standard IO implementation of an MCP server.

00:13:13.080 | It should look familiar to most of you working with MCP using the Python SDK, using Fast MCP.

00:13:19.080 | All I'm doing here is set up my server and using the decorator to define a tool.

00:13:24.080 | In this case, my tool is to roll a dice.

00:13:29.080 | And you might see in the code here, it has an input to define the number of sites.

00:13:35.080 | And I had to put a picture here because I have to admit, I just learned this myself.

00:13:40.080 | Do we have D&D fans in the room?

00:13:42.080 | Woo-hoo!

00:13:44.080 | All right, a few of them, so you all know what I'm talking about.

00:13:47.080 | For the rest of us, I just learned there are dices, and I have one here.

00:13:53.080 | I'm not sure if the camera can catch this.

00:13:55.080 | It's just one of them here on the slide.

00:13:58.080 | A dice that has, for example, this one has 20 sites.

00:14:01.080 | Something very normal in the D&D world to start, I think, your game.

00:14:06.080 | Don't ask me questions about D&D.

00:14:09.080 | My colleague Mike Chambers, who is either here or in the expo right now,

00:14:12.080 | he built the demo, so kudos to him.

00:14:14.080 | And he can answer all of the D&D questions.

00:14:16.080 | All right, just keep that in mind.

00:14:18.080 | I'll come back to this in just a second.

00:14:20.080 | Now, what we want to do here is to decouple and kind of connect to remote MCP servers,

00:14:29.080 | because the topic is to scale, right?

00:14:31.080 | And the way to do this is, in the AWS world, as easy as just deploying it as a Lambda function.

00:14:38.080 | So, we can do this now with streamable HTTP, and the same concepts apply.

00:14:43.080 | You put your Lambda functions, as you would have before, behind an MCP gateway, and then connect.

00:14:49.080 | And because we care about security and authorization, in the quick demo I'm going to show you,

00:14:54.080 | I'm using an authorizer.

00:14:56.080 | You can also plug in a Cognito framework for this part.

00:15:00.080 | And I'm also going to store session data in a DynamoDB table.

00:15:03.080 | So, let's roll this quick demo here.

00:15:06.080 | So, what you see here is an MCP Lambda handler that we developed.

00:15:10.080 | It's available on the GitHub repo, which makes it really easy to kind of set up your MCP server in Lambda.

00:15:16.080 | Here's a very simple Hello World example.

00:15:18.080 | The tool is just, again, defined with a tool decorator in here.

00:15:23.080 | And then, in the Lambda handler function, you can reference the input here, the invoke function, and pass it to that MCP server.

00:15:30.080 | Now, if we're looking at the server implementation, and here we're doing a little bit more,

00:15:34.080 | you can see how we're adding session table support, which is a DynamoDB table.

00:15:39.080 | So, we're defining the tool.

00:15:41.080 | This is the rolling dice tool that I just pointed out, but this time it's hosted as a Lambda function.

00:15:46.080 | You can write all the code you want to have there as well.

00:15:49.080 | And then, at the very end, it's the same single line that basically, when you call the Lambda function, passes this on to the MCP server.

00:15:56.080 | Let's deploy this, and again, we're using the existing tools to deploy Lambda functions as we have before.

00:16:05.080 | So, this one is using AWS SAM to just deploy that to the cloud.

00:16:09.080 | And then, we will receive the API gateway URL as well.

00:16:12.080 | Now, from the client side here, I'm using strands agents, as you can see.

00:16:17.080 | And then, I am using the MCP integration.

00:16:22.080 | I'm passing here my API gateway URL to connect.

00:16:26.080 | For authorization, I have a bearer token.

00:16:29.080 | Again, this is a simple concept demo, but you can build more robust integrations here as well.

00:16:34.080 | I'm calling the list tool, and then I'm passing those tools to my agent, as we've seen before.

00:16:40.080 | This time, it's the MCP available tools.

00:16:43.080 | And then, if we run this here, we can quickly see this in action and basically going to ask it here to roll a dice.

00:16:54.080 | And we're asking it to roll a D20.

00:16:57.080 | So, again, 20 sides.

00:16:59.080 | And it's coming back.

00:17:01.080 | What did we roll?

00:17:02.080 | You can see the tool use kicking in here.

00:17:04.080 | We rolled a seven.

00:17:05.080 | Great.

00:17:06.080 | So, this is just really a quick example.

00:17:09.080 | The good news is, once you're in the AWS world and you're working on Lambda,

00:17:12.080 | everything you can build with Lambda, you can integrate there.

00:17:15.080 | So, basically, you have access, again, to all of the great features, capabilities, applications you might have already built on AWS.

00:17:23.080 | Now, the next step here is, how do we make agents talk to each other, right?

00:17:27.080 | That's kind of the next frontier.

00:17:29.080 | And we're super excited about all the open protocols that are emerging right now.

00:17:34.080 | With MCP, for example, we joined the steering committee.

00:17:37.080 | We're an active part of the community, contributing code, and helping to further evolve MCP.

00:17:43.080 | If you want to learn more about this, here is the QR code.

00:17:46.080 | We have a whole block series started on our open source block.

00:17:49.080 | Feel free to check that out as we continue to help evolve those protocols.

00:17:54.080 | Now, what's next?

00:17:57.080 | We all are aware that this is just the beginning, right?

00:18:01.080 | There will be so much more coming.

00:18:03.080 | And if you had a chance to check out my colleague Danielle's talk yesterday on useful general intelligence, I just want to quote her a little bit.

00:18:10.080 | She said, "The atomic unit of all digital interactions will be an agent call."

00:18:17.080 | So we can imagine a future here where you might just have a personal agent shown like this, connecting to an agent store and really kind of having agents together accomplish tasks for you.

00:18:28.080 | And some of you here in the room might already be building this, right?

00:18:32.080 | So let's go and build this future together.

00:18:35.080 | Thanks so much.

00:18:36.080 | Check out the additional sessions we have.

00:18:38.080 | My colleague, Mike, is going much more into the rolling dice demo, everything MCP and strands.

00:18:43.080 | And my colleague, Sue Montemara, will also have a deep dive on strands.

00:18:47.080 | And with that, thank you very much.

00:18:49.080 | Check us out in the expo hall and grab your own D20.

00:18:52.080 | Thank you.

00:18:53.080 | Thank you.

00:18:54.080 | Thank you.

00:18:55.080 | We'll see you next week.

Building Agents at Cloud Scale — Antje Barth, AWS