Building Agents with Amazon Nova Act and MCP - Du'An Lightfoot, Amazon (Full Workshop)

00:00:00.000 | building agents with Amazon Nova Act and MCP I'm excited today because we're going to build

00:00:25.680 | intelligent autonomous AI systems that can help you build scale and improve your applications

00:00:35.440 | and business my name is the one Lightfoot and I'm joined by my name is the one Lightfoot and I'm

00:00:54.960 | joined by hey I'm Banjo Biyami I'm a solutions architect here at AWS now this is the AI engineer

00:01:03.480 | welfare and I've been in tech over 15 years and right now is the most exciting time for me in my

00:01:13.560 | entire career and one of the reasons for this excitement is agents how many of you right now

00:01:20.640 | are building agentic systems I love it so when we talk about agentic AI I think it's important that we

00:01:31.800 | level set from an AWS perspective there are three key terms we need to think about first the ability to

00:01:41.160 | plan agent gets a prompt it gets an objective and it determines the actions that need to be taken so

00:01:49.680 | creates the plan and then it takes actions on those actions by using things like tools now the last piece

00:01:58.080 | the third piece the third piece and probably the most interesting is the reasoning where the agent is able to evaluate

00:02:06.600 | the results and determine if it needs to update the plan and take additional actions until the objective is complete this is an agent

00:02:16.860 | now when we actually break down the architecture I think it's important to take a look at this because we have the user input we have the agentic system we have the possibility of some type of human in the loop and then we have the generator response

00:02:34.860 | response now when we dive a little deeper there are some components of this agentic system we have the LLM we have a knowledge base with external information that we may want to provide we have guardrails to say to the model don't do this or to ground the model with the truth from our knowledge base to say okay is this actually relevant information is this accurate to the information we're receiving from the knowledge base

00:03:02.860 | and then we have access to additional tools memory or we may need to talk to additional agents or LLMs like Amazon over at do something like MCP and we have the ability to design our own flows for these systems

00:03:18.860 | now the most interesting piece that I think a lot of us are probably focused on when we're building these systems is around the continuous evaluation framework like how do we know if we're using the

00:03:30.860 | right LLM how do we know if our prompt is consistent accurate or even optimized for the performance we're expecting and then how do we even judge our system how do we rate that and determine that that it's actually solving the problems that we need or intend

00:03:46.860 | now once we have this we need to log this information and then have some type of subject matter expert and determine how can we improve this system and this is the iterative approach so we're always trying to improve

00:03:58.860 | and optimize our agent system now continuing on this continuing on with this story now there are some use cases that we should be building these systems for like if it's complex tasks and we don't know which tool should be used how many tools should be used and we want the model to leverage his reasoning capabilities well this is a great use case for agent system but if it's something that is just one step

00:04:26.860 | our traditional if this then that approach is probably the best solution right we don't always need to provide some type of agency system for something that can be done with a traditional solution

00:04:39.860 | now when we talk about agents on AWS there are three approaches and perspectives we should think about

00:04:46.860 | first it's going to be the specialized using something like Amazon Q how many of you are have used Amazon Q the there's Amazon Q in the console to help solve your problems on AWS in the console

00:04:59.860 | there's Amazon Q developer inside of your IDE and right now one that I'm I would think most excited about is Amazon Q CLI agent how many of you have used that

00:05:08.860 | for me if you if you are into increasing your productivity using a CI CLI agent has helped me tremendously from editing the video it can do that summarizing the document reading my entire code base like today for one of my demos I had some code and I was trying to figure out why wasn't it working I said analyze this code and tell me what you see let me know what you see

00:05:36.860 | let me know the API's that it's calling well I looked at the API's well it didn't match my API's in the API gateway so when the code was deployed it wasn't deployed with the right API's so the agent was able to help me save a ton of time by just analyzing the code and tell me what it saw because I never seen the code before right so that's what these tools are able to help us do the next is fully managed if you're using Amazon bedrock you're able to leverage Amazon bedrock agents to build

00:06:04.860 | build and manage agents inside AWS and today what we're going to be focused on is the DIY to do-it-yourself approach by using strands agents this is allows you to not just leverage Amazon bedrock but also leverage models through other providers using light LLM

00:06:24.860 | now when we talk about strands agents strands agent was announced about a month ago I want to say something about a month ago this is open source extremely lightweight

00:06:34.860 | so if you use other agents flat frameworks is like that but the implementation is you'll see in the code how easy it is to build an agentic system or agent itself in a few lines of code and already get started I built a multi-agent solution in about under 50 lines of code

00:06:52.860 | and so when we break down strands agents there are three components we have a prompt we have a LLM and we have tools so you create a function called let's say a get weather tool right you define your agent you give it a prompt and it's already implemented and you'll see in the code as

00:07:14.860 | as banjo goes through here in a moment now taking it a step further as danielle presented today on amazon nova act

00:07:24.860 | these models are able to do some really cool things and this is another thing that i'm excited about amazon nova act is a research preview model and the capabilities of this allows you to use a prompt or give instructions and take complex tasks and do the

00:07:42.860 | and take complex tasks and do things like browse the internet to find research or to research or to search on amazon.com to find the top list of widgets

00:07:52.860 | right and then return them and then add them to your card so you'll see how we can leverage this not just using the sdk for amazon nova act but also by leveraging mcp which leads us into the last piece which i think when we're talking about agents i

00:08:10.860 | i don't think we will be here today as fast as we have moved if it wasn't for mcp how many of you are

00:08:17.900 | leveraging mcp today model context protocol how many of you have built your own mcp servers

00:08:23.580 | i built several um i got two that i use all the time one how many of you use obsidian

00:08:33.580 | okay so for my documentation i built obsidian mcp server this allows me to save all my documents

00:08:41.500 | reference all my documents and just my entire workflow is streamlined because of this mcp server i use right

00:08:48.780 | there but i also use one for my bookmarks i built a bookmark manager because every friday i'm restarting my

00:08:54.620 | i'm restarting my computer and i lose my bookmarks i save them and i forget about them but now i can just

00:08:59.500 | say save this bookmark it gives it a description gives it a title give it a date and i can even add

00:09:04.380 | notes so i can remember where this bookmark so now when i open up qcli i can say hey i'm looking on the

00:09:10.140 | topic i'm looking for um some information on mcp can you tell me all the bookmarks that i have then it'll

00:09:15.260 | find it can you tell me the ones i saved last week and so these this is the power that we have today but

00:09:20.780 | but with that being said i think it's time that we all start building banjo is going to take over but

00:09:27.500 | if you you open your laptops and log on to this link this is going to take you to a workshop environment

00:09:34.060 | where you have access to an amazon account where banjo is going to walk you through building out

00:09:39.740 | today's workshop i thank you for your time cool all right so uh this is going to be a hands-on workshop so

00:09:46.540 | we've provisioned an aws account for everybody here so you don't have to install anything on your

00:09:51.180 | computer everything's going to be done through the browser and i always say the hardest part of the

00:09:56.220 | workshop is just getting started so some of my colleagues are also here so raise your hand aws

00:10:00.220 | folks that are here to support so we're going to take some time to just get logged into an environment

00:10:05.340 | we're going to set up a vs code server enable models get the nova act api key so again this is the

00:10:11.340 | hardest part of the workshop that's getting started so let's take some time to just get into the environment

00:10:16.620 | and i'll follow along as well so and this is uh again everything is you don't have to install

00:10:23.020 | anything on your computer you don't have to use your own aws account everything is provisioned for

00:10:26.380 | you so but while that's loading i'm going to briefly walk through the three modules of the workshop

00:10:30.540 | so the workshop is really about how you can use nova act so the first module is just getting started

00:10:36.140 | with nova act we're going to make an api call that the second part of the module is going to make an mcp

00:10:41.420 | server that can leverage an nova act and finally we're going to use the strands agent to cook

00:10:45.900 | everything together so that's kind of the the three steps we'll go through at this workshop and all the

00:10:50.380 | code is available via the link on github so you can try it on your own as well uh but yeah trying to get

00:10:56.940 | started here if you can't follow along i'm going to be doing up here so don't worry too much and again

00:11:01.740 | all the code is available so you can try it out line offline okay so the first things first uh if you're

00:11:09.820 | following along make sure to click this open aws console button again we provision the aws account

00:11:15.180 | you know don't log into your own aws account don't try to create a new one everything is uh pre-visioned

00:11:20.460 | here already so i'm gonna click clicking that button to open up your aws account so logged into my aws account

00:11:34.940 | so the first thing we do in the aws account we're gonna enable uh amazon bedrock models so

00:11:39.740 | amazon bedrock think of as a serverless api to access different foundation models and you can build lots

00:11:45.340 | of agenda for applications in it so it has capabilities like knowledge bases guardrails you can build agents

00:11:51.580 | on top of it for anything you need to build uh ai agents or agenda applications amazon bedrock has

00:11:57.740 | capabilities for that but for this workshop we're just gonna enable specific models so gonna enable

00:12:04.860 | specific models you can click the amazon models and then we'll use the cloud 37 3.5 iq and 3.5 sonnet

00:12:13.020 | so those are the ones we're going to use for this workshop

00:12:17.820 | and that's going to request access there

00:12:21.020 | and again all the instructions are also in this workshop as well so we could follow along but i'm

00:12:31.820 | just going to go through it just for sake of time and then the next part once we get the model access

00:12:37.500 | there's a vs code server that has everything set up already so i'm just going to go in there

00:12:47.100 | and if the url and password is there you can log into your vs code server with everything installed

00:13:04.220 | i'm also going to log into amazon queue so amazon queues are id extension to help you write code

00:13:14.860 | we have time you can sign up through a builder id completely free you don't need uh aw's account

00:13:19.740 | you don't need to put in your credit card you can just log in through there i already have an account

00:13:24.220 | so just feed it up but it puts a nice little ai agent there you can ask questions update code etc so

00:13:31.980 | it's uh i'll show you some examples i'll just go through some of the code so

00:13:36.300 | so who's gotten to this point setting up all the models workshops because this is once you get all this

00:13:44.300 | done then that's when the real fun begins so that's getting a pulse if i need to slow down or

00:13:49.340 | slow down a bit okay i'll wait a bit again raise your hand if you're stuck anywhere questions

00:13:54.860 | we have uh agents that can come around and support you so i'm going to pause for a little bit

00:13:59.740 | any general questions while we're waiting

00:14:09.660 | oh yeah so this workshop uh again all the code is available online uh this workshop available as

00:14:15.180 | well so you can also look through that there's a website called workshops.aws

00:14:22.860 | and when you go there you can do something like uh nova act and then it's the only work that that shows

00:14:29.420 | up so you can always go to workshop.aws that search nova act and this workshop was so up so you can see

00:14:35.180 | see all the instructions all the code and run this uh on your own

00:14:41.740 | okay

00:14:52.300 | okay and then last thing uh because we're going to use nova act we actually need to get a nova act api

00:15:00.460 | key so if you go to nova amazon.com it this is a website that you can use the amazon nova model so

00:15:07.900 | you can do like chatting generating images uh speaking with nova uh generate videos but then also

00:15:14.140 | this is where the act api key is generated so if you're following along and you want to generate your

00:15:20.700 | key again it's free to log in you can use your amazon.com uh like when you order something on

00:15:26.620 | amazon.com account to log into this and then you can just generate a key here and they'll be able to

00:15:31.740 | access that

00:15:49.420 | oops oops okay so i'm going to walk through what module one is uh before again has anybody got

00:15:56.380 | in here just quick pulse check if not you know i'll continue i know the wi-fi is slow so it might be

00:16:01.580 | hard so i'll just continue on uh but yeah the first one we're going to see how nova act works uh how

00:16:08.700 | how the actual code looks like uh generated the key i need to export the key and then kind of running

00:16:14.780 | the first script which is actually going to open amazon.com

00:16:18.140 | uh and we're actually going to look for the first coffee maker so let me see how that code looks like

00:16:24.460 | let's go here make this bigger

00:16:27.900 | so very simple code uh with nova act it's again it's all in python sdk i so i decide what a page to go to

00:16:43.980 | so go to amazon.com i say i want you to search for a coffee maker i say select the first result and i say

00:16:50.300 | get the title of that product page so uh very simple if you've ever done kind of web automation before

00:16:55.740 | of something like uh selenium or playwright you probably have to like look for this diff tag you

00:17:00.380 | know look at this h1 tag grab this information a lot of manual processes of actually inspecting

00:17:06.060 | the actual website here i'm just saying click the search bar find something like i don't have to

00:17:10.940 | specify click this tag do that so it makes it much more easier to engage with the website as a natural

00:17:16.300 | human would instead of like looking through divs and trying to find this p tag specifically so

00:17:21.900 | uh this is a great way to just uh you know use nova act right out of the box so i'm going to uh run this

00:17:29.260 | so you can see

00:17:31.500 | examples all right all right so added my key going to what happens when i run this file

00:17:47.500 | give it a second

00:18:00.300 | oops we failed all right let's start over

00:18:06.940 | put it down three let's write one

00:18:16.060 | ah okay i know

00:18:21.100 | i gotta run it with this let's start that over

00:18:25.980 | yeah question

00:18:36.700 | so just explain why is banjo running that command it's running fxv fb it's a frame buffer where it runs your

00:18:45.100 | x11 system what happens there

00:18:47.660 | nova act actually goes and clicks a mouse on a browser that's why it needs to be run like that

00:18:52.780 | otherwise it has no gui so this is just kind of a way to emulate a graphical user interface on this

00:18:58.540 | linux box thank you daco yeah since we're running everything in the the cloud on a browser i'm saying

00:19:04.060 | you know open a browser again but it's already in a browser so that's why i crashed so i have to put

00:19:07.980 | that a frame buffer command uh and yeah the workshop kind of walks through why we did that but you can

00:19:13.820 | see uh what is going on when nova act says i'm going to search for a copy maker i'm at the amazon home

00:19:19.580 | page my task is to search for this so it's understanding what it's doing i see the search

00:19:23.820 | bark has copy maker i'm at the search spark here and now it actually puts the actual log of the actual

00:19:30.460 | html file so it's taking screenshots you can see what it looks like it got the first results i'm on

00:19:35.900 | the copy maker page it selected it and now i've got the title now it says you know what's the title of

00:19:41.580 | this product page all right got this black decker 12 copy maker my task to return the title of the

00:19:51.900 | product page product title it got that and it ended the session and then it also creates a a video log that

00:19:58.380 | i can actually look at to see what it did for each for everything it did in this webm file so yeah question

00:20:06.540 | yeah we'll repeat that it would be a microphone

00:20:19.180 | so the question was does it reason about the page in terms of pixels or in terms of text

00:20:30.140 | yeah so it's actually looking through the actual uh the page itself so you see in this video it sees

00:20:36.460 | it looks at the page i can see what's in the page so it's it's a large language model train so it can

00:20:41.660 | actually see the actual the page is doing so it's not looking at like like the h1 tag or whatnot and

00:20:47.100 | understand the context of that particular page you can see that's a search box okay i'm going to go

00:20:51.900 | click through that search box so yes it understands the pixel level of what's on that actual page

00:20:56.620 | so this is kind of the video it's hard to see make it bigger sped up

00:21:06.380 | so it opens the page it goes to the able to type in coffee maker there

00:21:10.700 | um it gets that information clicks the button so even if all the ads and everything the video can

00:21:16.860 | understand the task clicks that and it gets the information back so that's and that was a couple

00:21:21.900 | lines of code so you can extrapolate to other type of workflows you can do for searching through things

00:21:27.420 | sorry i have another question yeah so when you what i've experienced with these kind of frameworks is

00:21:33.580 | that when you run this on a server environment um services like cloudflare will block the access and

00:21:39.740 | maybe do a captcha challenge how do we solve that using q yeah so with uh so using amazon nova ad so it

00:21:46.620 | doesn't do captures it doesn't do that nature so it's it's meant for like workflows you understand but

00:21:51.020 | yes it's not going to bypass captures and other things of that nature as well so it's made for

00:21:55.420 | like going to amazon.com or look through a booking site but if something that like requires like a human or

00:22:00.540 | it wants again that you can't bypass that

00:22:02.380 | you you wouldn't use nova act for that use case if you need to pass a capture or something else that

00:22:10.860 | use another technology this is not meant to like overtake humans you know it's more like i'm helping

00:22:15.420 | them augment things but not if there's a capture involved and have to use a different technology for

00:22:19.580 | that it's awesome a preview it's awesome a preview yes this is a research preview as well so if that's

00:22:25.900 | a very good use case you know leave feedback on the nova the website so yeah is is human and loop

00:22:31.980 | possible at all with it yet well this this one it's no because i'm writing all the code here so but

00:22:39.500 | again this is python code so i could probably put in something here like you know ask something make

00:22:44.220 | an api call here so this is you know it's a python code you might be able to create some type of uh

00:22:49.660 | workflow that might augment like wait for a human response or whatnot because the browser is happening

00:22:55.180 | in like headless mode but could you make it work with a browser to human is also seeing at the same time

00:22:59.820 | yes yeah so it can pause and wait for somebody put in like a password credentials or do a capture

00:23:04.940 | and then once it receives that works continue on the workflow you could do yeah because right now i

00:23:09.740 | i ran it in headless mode but yes it can also run uh you know to open up the browser if i ran this on

00:23:14.380 | my macbook i would open up a chrome browser and go through that session also if you're running it and

00:23:20.220 | you wanted to bypass something that's two-factor if you're already logged into say amazon.com and then you

00:23:27.180 | run a code it's going to use your credentials in that browser session to continue on to perform that task

00:23:33.180 | so that's something that you can do as well cool so let me oops and then one other thing you can also

00:23:44.140 | do multi uh you know parallel execution so my last my next example is actually i'm trying to find

00:23:49.900 | multiple monitors and i want to compare them all at once so i'll show you how that code looks like

00:24:02.540 | so i can check for the monitor extract information i'm setting you know i want i'm defining what i

00:24:09.900 | want so again i'm you know saying i want to find the price the rating the size uh go to amazon.com

00:24:17.260 | uh i set it headless mode this time so i don't need to do the frame buffer i start multiple threads

00:24:22.460 | it looks for each monitor simultaneously because each of these are individual tasks so i can paralyze

00:24:27.340 | them instead of waiting it to go through i define the list of monitors i want to go through start the

00:24:33.260 | thread and then it starts executing and finds the results of the monitors so i can run that in the

00:24:38.700 | background it was starting with three parallel threads and it's open so again running in headless mode so

00:24:49.500 | it's going to be able to do this in the background where we can see kind of what the model is thinking

00:24:53.340 | how it navigates through the web page yep all right i have tried to use nova in the past april and uh it worked for

00:25:03.900 | the first time but once i did it again it triggered the capture is this something that has been already

00:25:10.780 | resolved or is this happening because i think the website and it was amazon in this case it was detecting

00:25:16.780 | it was a bot and uh is there like an llms.txt or robots.txt that can declare it so nova actor is a github repo

00:25:25.340 | so you could go there and just grab that but it's it's working now like i'm running it you know i just i this is

00:25:30.940 | this live code i'm doing right now like i just exported my api key started running it so

00:25:36.380 | uh you can try it in the workshop i'll be yeah i mean it's ready to go we're building right now

00:25:41.660 | and you can kind of see that it's going on in the background what it's doing

00:25:47.100 | i've looked at this monitor the dell monitor i'm at the amazon home page it's like it's going through

00:25:52.700 | looking through the search results it's saving things so you can see it's running in parallel it got the

00:25:57.100 | information for the one of the first ones so it's going as i just set that up and it can execute

00:26:02.620 | that so if you have some type of uh i don't know like daily news thing you need to go to the website

00:26:07.340 | and get news or something and i have a report nova and there's no api for that this is one way you can

00:26:12.300 | codify how to do that kind of search and get the information

00:26:17.500 | so if you have a question yeah i'm wondering so how successful is this in terms of like more

00:26:22.380 | ambiguous tasks because i i ran the amazon demo and that worked but i'm wondering could i just add google

00:26:27.020 | there sure and and and how like how big and and sort of how much does it know when it's navigating

00:26:33.580 | through like i was thinking like if i wanted to return a pair of sunglasses that broke would i would i be

00:26:39.820 | able to just say like start in google and then find this company's website find a way to you know

00:26:45.340 | engage support open a ticket like how much sort of like how vague can you be and how smart is it

00:26:52.300 | currently would that would that yeah i mean the more instruction you give obviously better but

00:26:56.780 | it's able to understand how to navigate a website that's what the model is trained on so if you say

00:27:00.620 | you know go to this sunglasses website it doesn't it probably wasn't training a specific sunglass website

00:27:05.180 | but it can understand that button support you know if this buzzes click a ticket so it understand kind

00:27:10.060 | of the general knowledge of how to navigate the website but if there's something very intricate

00:27:14.140 | about that website you're going to have to encode it in the text like make sure you click

00:27:17.740 | button x first or whatever so it understands how to navigate websites got it and does it understand

00:27:23.500 | when it's failed yeah sometimes sometimes i've seen it sometimes get stuck in a loop and like oh no

00:27:29.020 | i keep scrolling i keep scrolling i keep scrolling it doesn't know when to stop so it again this isn't

00:27:33.180 | research preview so things are getting better the model is getting updated behind the scenes but

00:27:37.420 | it's not like it's not agi so that's got it and one last question um how is it in terms of navigating

00:27:44.540 | like distrustful parts of the internet i mean there's a lot on the internet that we see and we know is not

00:27:49.420 | to be trusted or it's something not to be followed how have you sort of worked around that problem yeah

00:27:54.620 | because again it is a model in the background so it's going to understand like if you're doing something

00:27:58.700 | it's not going to want to click that or might be there's safeguards in place so that's built into

00:28:03.820 | the model but again uh it isn't research preview you still have to explicitly say what buttons to press

00:28:09.100 | for certain actions but again the model it is an lm train it's going to be able to understand the nuances

00:28:14.780 | and say if it can't take this action or can't do that that could happen but i haven't seen that use

00:28:19.580 | case but if you keep pushing it maybe you'll find those those things well the thing i had in my mind is

00:28:23.580 | like if you go to a site where you have to download a link sometimes there's an ad that says download

00:28:28.860 | a link and you know that that's just an ad trying to get your attention of course would the model know

00:28:33.100 | or is that some yeah if like for example like in the the amazon.com it shows an ad for something but

00:28:39.020 | i said find the first thing was able to scroll past that ad and click something so the model understands

00:28:43.180 | the task you give it so yes it can understand that thank you all right so this this is finished yeah

00:28:51.420 | that's really quick it showed you got it was able to find all the models give me the size the rating

00:28:55.820 | the price reach of the monitors so again it executed that on parallel it got me the nice information

00:29:01.420 | and that that's kind of the idea of like it can do parallel execution in the background so you don't

00:29:05.740 | have to wait for it and don't see it actually clicking through the the task and you get your information

00:29:10.060 | all right one more question then we'll move on to the mcp part so nova is specifically meant to be used

00:29:18.060 | with the browser correct uh so nova act so amazon nova is a family of models on amazon so if you go to

00:29:26.220 | this website nova amazon.com you see there are different foundation models like nova pro premier

00:29:32.540 | light micro these are like the text understanding models so like your typical llm calls there's also an

00:29:38.940 | image model called nova canvas can generate images that the video real called nova real where can generate

00:29:45.100 | videos and that it's also a speech model text speech to speech called nova sonic so nova is a foundation

00:29:52.140 | of found uh foundation models by amazon to do all these type of tasks and act is just another one for

00:29:57.820 | browser automation are there plans to expand this like beyond the browser so that we can someday take

00:30:04.780 | actions in slack or ide or anything outside of the browser maybe some of the team is here so maybe talk with

00:30:13.020 | them later thank you all right so i'm going to move on to the mcp part vanjo yep nova act is only available in

00:30:24.540 | u.s yes right now nova act is only available in the u.s it's in preview so it's just getting started so

00:30:32.140 | if you log in from like a different uh account like address like uk or something might not it won't work so

00:30:37.580 | it only works in the u.s at the moment yes all right one more question over there and then i'm going to

00:30:45.260 | move on if you live in three monitors um i got the same results as you did but i actually got a different

00:30:50.540 | price with the samsung odyssey why do you think that might be oh

00:30:54.060 | your amazon.com is different i don't know yeah because it is opening up a different browser so it could have

00:31:03.580 | clicked something differently yeah so yeah that's right

00:31:07.980 | we can actually look at the video premium or video playback to see what the results were like

00:31:12.780 | yeah

00:31:14.140 | one more okay one more quick quick one are there plans to support persisting browsing data such

00:31:23.980 | as cookies in the cloud browser so right now it's opening up its own browser but you can also set like

00:31:30.060 | your own like chromium profile and open up that browser so all the thing you have saved there like

00:31:34.940 | you want to log into your stuff you can set your own custom browser but right before it opens up a new

00:31:39.420 | like completely new browser without anything saved

00:31:41.900 | all right so i want to show uh i actually made an mcp server for nova act so a module tool is going through uh mcp

00:31:53.660 | and i can kind of show you what i did for the mcp server uh in fact we can use the amazon q here so i'm

00:31:59.740 | going to ask it uh can you tell me about the nova act mcp server

00:32:14.620 | can you tell me what it does what it does it does and oops

00:32:20.380 | so tell me about the nova act mcp server

00:32:25.820 | so you can see it's going through uh integrates nova act browser at mcp it has the browse session tool

00:32:34.060 | browser action execute parallel tasks take screenshots close browser list results so i created these different

00:32:40.380 | aspects of the mcp server so i could use something like claw desktop or cursor or amazon qcli to that

00:32:46.780 | say you know open amazon.com and find information for me so it's it's portable it understands uh so i

00:32:53.020 | don't have to actually write code i can say so go to amazon.com and find me the cop the first coffee maker

00:32:58.060 | it'll actually write all that code i did in the initial one to do that or the multi-monitor so i wrote a

00:33:03.660 | bunch of code to do this if i just said you know get me these three monitors to get the price it would

00:33:08.460 | actually write all the nova act code it needs to do that using the mcp server so that's kind of the

00:33:13.100 | power of mcp that i just describe a task and then i can encode the actual browse action things it needs

00:33:19.100 | to so and then i also made an mcp client that can actually interpret that so oops it connects to the

00:33:28.140 | mcp server it runs the code and is able to use query bedrock uh i am using a model so i'm using claude 3.5

00:33:35.660 | sonnet here because it's an mcp client and needs to have an lm behind that and then it's able to

00:33:40.860 | you know understand which tools to use uh run the code and open up the browser and whatnot so let me

00:33:47.100 | just run the example here so module two

00:33:55.100 | so er open the file just did that we asked amazon q to explain the file to us and now we're actually

00:34:04.540 | going to run it so python 3 and then i can open this up

00:34:23.820 | okay so let's be adventurous if somebody give me a query to try since anyone has an idea i'm going

00:34:30.940 | to just ask it and do something so someone give me an idea of what to run another act fix wi-fi

00:34:37.340 | how would you fix the can you find a website to fix website can

00:34:46.780 | fine fine let's see website to fix wi-fi use headless mode

00:34:54.460 | i spelled it wrong but let's see

00:34:59.900 | all right it goes to google.com how to fix wi-fi problems troubleshooting guy in the box and press enter

00:35:12.540 | return a list of the website title descriptions all right it's going through that so it open

00:35:19.100 | google.com how to fix wi-fi problems i see an empty search bar where i can type queries for search

00:35:25.420 | information i should type how to fix wi-fi problems so you can see it's understanding what to do

00:35:29.740 | you know oh it hit a recapture page so okay the search results are not available blah blah so

00:35:38.140 | so see it looks like it got stuck in a recapture page so this is like a headless agent so someone

00:35:42.220 | asked a question about going to pass capture and whatnot you see that it's it got stuck doing that

00:35:46.220 | it looks like it's stuck in a loop now so it sees the capture again so i should skip the clip button to

00:35:53.740 | skip the capture window the capture is still open so it's probably going to be stuck here unless i close

00:35:58.620 | it so you can see there are limitations it's not going to pass captures and whatnot but that's that was a good

00:36:03.580 | query to show that it oh did it fill it it's still open so it's going to be stuck here so i'm just going

00:36:08.300 | to close it out but you can see you know it can't pass everything it can't navigate through websites so

00:36:13.260 | something like that will wasn't it will not work so that was a great test example to show

00:36:17.820 | if i use the the baked in one you know find that copy made under 50 dollars it'll be able to go through

00:36:24.860 | that and use headless mode but any questions on that seeing how the mcp server is working i didn't have to

00:36:30.380 | write code i just said do something it actually wrote the code to do it for me

00:36:34.220 | question over here

00:36:37.500 | yeah yeah so a question about if i can actually go into the browser and do it myself yeah if i ran this

00:36:46.060 | locally on my machine it will actually be able to you know open up the browser and i can have to click

00:36:50.540 | the button and it'll continue doing that right now i'm running it within the browser so i'm everything

00:36:56.220 | everything in headless mode so we can't interact with that

00:36:58.540 | so you can see it's able to find search under 50 dollars it can actually look at the website

00:37:06.940 | it's found search results on amazon.com so yeah so that for that use case where we're not passing

00:37:12.860 | captures is able to continue and find the information there

00:37:20.300 | so a question about can i actually order something if i use my own browser session and like logged in

00:37:25.100 | at my amazon.com account and said yes order this for me you know click through it will be able to

00:37:30.140 | understand that thing but i would have to put it in i would have to use my own browser session so i

00:37:35.020 | i wouldn't want to log in by myself so yeah

00:37:39.820 | another question

00:37:44.460 | if you give novak the authentication for amazon for example like you give it your login details

00:37:53.100 | then can it log in and complete that action for you yeah but if i say this is my username this is my

00:37:58.860 | password enter that into that field and you'll be able to understand you know this is a sign-in button and i

00:38:03.820 | have this information but again this is all python code so yeah you can encode it you can make it an

00:38:08.380 | environment variable so it won't read it directly so a lot of ways to do that does it also like

00:38:12.540 | understand 2fa let's say it asks you to go to your gmail and you will then open the gmail website check

00:38:19.180 | the email if you're logged in again on your session and then input it or is it okay well if there's no

00:38:24.380 | capture like we just thought of the capture yeah so there's no nothing blocking so but yeah again nova act

00:38:29.820 | is free to use so there's a lot of creativity in this room so i think we should have like a

00:38:33.500 | nova act hackathon i think that'll be you know do something crazy with nova act

00:38:37.260 | all right so one more question yep one more can i book a flight when my price alert is less than

00:38:48.620 | hundred dollars it's like a continuously check you probably use something else for that but yeah i mean

00:38:54.060 | no back and open up that website it can just have a query every day you know open google flights and look

00:38:58.940 | look at the quickest thing and if something is below this dead hold you know send me an email so again

00:39:03.580 | this is all a python script so you can set up something that triggers like once a day like in a

00:39:08.300 | lambda function and so yes totally possible so nova act is very flexible and because it can run in headless

00:39:14.380 | mode you don't need to have that ui so that's really what makes it helpful for interacting with websites

00:39:19.020 | that don't have a native api

00:39:27.740 | thanks yeah this is pretty cool i'm a little bit confused like we have the nova sdk sdk api key and

00:39:37.500 | we're also doing some stuff in bedrock ah yeah so how does this actually work yeah so in that the nova

00:39:45.020 | api key separate but for this mcp client i did it actually needs a large thing with models to understand

00:39:51.580 | what's still happening so if i go to claude oops i actually said i'm actually using a claude sonnet

00:39:59.820 | 3.5 for my mcp server so that's how because i just asked it you know find that website for me

00:40:05.820 | how does it know that about any of the code doing that so it's using a large language model underneath

00:40:10.780 | the hood to actually find that information so that's where we use bedrock for trying to find it in the

00:40:17.020 | code but it's on it yeah i set the model id so your assistant you're an ai system helping you have tools

00:40:24.860 | you're using cloud 3.5 sonnet you're making an api called a bedrock whenever something happens so

00:40:30.380 | that's where the the llm we're using but nova act is separate from that so this m select of using you

00:40:35.820 | know claude desktop it's running an llm in inside of that they would understand that for the mcp server

00:40:41.500 | a question here a question

00:40:47.340 | uh the question is uh does it integrate with browser plugins as well like could it integrate

00:41:00.380 | with last pass if you have the last class plugin fill in the credentials through last pass and then

00:41:05.020 | continue i haven't tried that but again it does you can set up to use your own browser so if you do that

00:41:10.220 | and if that's integrated it might be able to do that and click through that but i have not tested that but

00:41:15.020 | something to try out thank you the biggest problem you will face is two factor like even if you gave

00:41:22.380 | it a password like if you're using something like google authenticator or something that would be like

00:41:27.180 | the biggest problem to capture but other than that if you provide it an environmental variable or if you

00:41:32.380 | give it instructions on how to access last pass in the browser you should be able to do it all right and

00:41:38.860 | uh oh one more question then we'll go on to the last module

00:41:46.220 | so clearly there are a lot of different uh agent architectures you could use um and what i can

00:41:58.300 | imagine using this is uh like you have a coordinator agent set up somewhere that's running in the overall

00:42:04.860 | app and then when something pops up and says hey you need to go and look this up online go and check it

00:42:10.220 | uh it should mod so my question is how modular

00:42:14.220 | i mean it's just python so it should be pretty modular right is that the way in which you're

00:42:19.740 | imagining the architecture to be is just if i was coding a coordinator agent in lang chain or lang graph

00:42:26.300 | for example it would then call your sub-agent and get and and run it stuff and then get and then get

00:42:33.100 | a text-based output that i throw into my message queue that's how it all integrates together is that right

00:42:40.220 | yeah that's one way you can do it so nova act again right it's a python so it could be a tool it could be

00:42:44.620 | an api call and the next module we're actually going to show you how to actually make an agent from that so

00:42:49.260 | good good t up right here uh so um juan talked about the strands uh at the beginning so strands is a new

00:42:57.100 | agentic framework launched by uh aws so let me open up the link uh it's easy as a pip install strands

00:43:04.540 | and the first agent is like agent equals that so it's very it's a model first uh way of interacting with

00:43:11.500 | agents if you use a lot of agent frameworks in the past there's a lot of bootstrapping and making sure

00:43:16.460 | everything is correct and like but that was necessary for kind of the older models like

00:43:20.380 | if you think back to like like llama 2 for example like how how far our models have evolved since then

00:43:25.740 | so but now we we can pass a lot of the you know bootstrapping we did previously the agent can figure

00:43:31.900 | that out so we don't need all these very uh heavy ways and like you know make sure everything's typed

00:43:36.940 | and every so whatnot so here's a very simple example of how i actually spun up uh and also it has mcp

00:43:44.060 | native support so in this example i actually have two mcp servers uh i have the aws documentation and

00:43:51.500 | aws diagrams mcp server so if you go to this like aws labs mcp these are the official um aws mcp servers

00:43:59.980 | and there's a bunch of different ones from like a cost analysis nova canvas diagramming cloud formation

00:44:06.460 | lots of different ones here uh so again it's all on github aws labs mcp but the example i do here is

00:44:13.900 | i'm actually i made like a solutions architect agent your role is to help customers understand

00:44:18.780 | this building on aws and i define these two mcp servers here i give it a prompt and i say this agent

00:44:27.020 | has all the tools in the mcp server it has a bedrock model i'm using cloud haiku here and what's cool

00:44:33.740 | about strands it can also use like light llm and olama so it has access to lots of different things or

00:44:39.340 | you can run it locally and of course it has access to amazon bedrock so that's what we're using here

00:44:43.500 | so all those three things makes the agent the tools the model and the system prompt and then i can say

00:44:50.540 | get the documentation for aws lambda and create a diagram of a website that uses lambda so let me run this code

00:45:18.220 | okay so it uses uv to install the mcp server locally a lot of people i don't know where does mcp run this

00:45:25.100 | is running locally but there are other ways to run it like in a lambda function and whatnot but for just

00:45:29.660 | testing it out it pulls down the the mcp server locally and runs it you can see it's already executing

00:45:35.260 | so let's make this a bit bigger uh so it says okay i'm going to help you with that first i'm going to search

00:45:45.020 | the aws lambda documentation i'll read the documentation then i'll create a diagram

00:45:49.260 | illustrating a static site so you can see it does a post request to do the search so the mcp server

00:45:54.460 | defines where everything is i don't have to like feed it in the well-architected framework the aws

00:45:59.900 | documentation is always updated so it just knows call the search function it got the lambda welcome file

00:46:05.180 | it put that in it's able to generate the diagram it generates the diagram it tells us what is going on how

00:46:12.540 | the workflow looks like it tells me it saved the diagram to this location i can open it up generated

00:46:18.220 | diagrams oops and now it's very small let me see if i can make this bigger

00:46:26.060 | there we go so i was able to generate the diagram for me so all through that about uh you know 40 lines of

00:46:35.820 | code i have two mcp servers i have my prompt and is able to understand that get that and just generate

00:46:41.340 | something for me with that uh so that's very easy to get started with strands of building agentic

00:46:46.700 | workflows i know agent means a lot of different things to different people but you know if you have

00:46:51.580 | tools the model the system prompt do some type of action and strands makes it extremely easy to do that

00:46:57.820 | if i use other frameworks it could be a lot more code to do something like that especially integrating mcp

00:47:02.940 | natively like that i'm going to pause here for any strands questions

00:47:08.300 | there's coming

00:47:15.020 | um i know bedrock already had its kind of agents sdk so is strands replacing that or is this now the pro is

00:47:25.580 | this replacing that or is supposed to complement that like is this the preferred way of creating agents with

00:47:30.620 | models in bedrock yeah well when it comes to preferred way it always comes down to your use

00:47:34.860 | case so the bedrock agent has a lot more i guess opinionated ways to do things as you can do through

00:47:40.220 | the console it has a built-in support uh right there in aws well strand is more it's an open source

00:47:45.900 | framework so you can download the code you can use other models through that like light llm or llama

00:47:51.100 | if you use bedrock agent you can't run that offline so there's different use cases different developer

00:47:56.220 | tooling i mean me the software engineer i like you know code first doing things so it does

00:48:00.540 | depend on your use case what you're trying to do in your experience okay can you show the code real

00:48:04.700 | quick yeah yeah this is the code yeah just show the agent so this is an open source framework if you go

00:48:12.940 | where it says agent you and it says model right now we're using a bedrock model but you can use another

00:48:18.140 | model with light llm yeah so you don't need aws at all in that instance

00:48:23.420 | you can use all llama you can use open ai you can use right

00:48:26.940 | yeah so there's documentation and topic light llm a lot of different model providers

00:48:37.660 | olama open ai so it's an open source framework so you can use it whatever you want so but yeah that's the

00:48:43.100 | idea of a strand open source model agent development kit one question suppose i want to build a text to

00:48:50.860 | sql agent and i have say 15 tools already built in that i want this agent to be able to use if i use this

00:49:00.060 | framework how can i make sure that the agent know when to use the right tool and the sequence yeah great

00:49:10.060 | question uh so i didn't this example i have a weather agent so one thing you said you already have tools

00:49:17.580 | what i like about strands a lot is i can write a python function i already have and let's put this tool

00:49:23.100 | decorator and that's it you know you don't have to put anything else it understands this is the uh what you

00:49:27.900 | need to do and then when i'm going to that agent i have this tools and it has put in the the native tools

00:49:33.900 | we're going to be using http request is a standard tool in the strands framework so in this example i'm

00:49:40.220 | like asking what is the weather in seattle and then also how many words are in this response

00:49:44.780 | this is open api api weather.gov where you don't need an api key and it can find the information for

00:49:51.340 | you so i'm going to update this san francisco and this started wrong but it's to figure it out a

00:50:01.100 | weather example where the word count and i was very specific you know find the weather first and then

00:50:06.300 | how many words are in the response so it's able to use that tool it gets the forecast and then it knows

00:50:11.340 | to use that word count tool next so we're passing a lot of the information to the model the models are very

00:50:16.940 | smart now we don't have to say do this do this do this the let the agent figure it out that's kind

00:50:21.500 | of the role of the agent you give it the context and tools necessary it figures out the best way to

00:50:26.300 | solve the problem but then wouldn't it be prone to hallucination when you give it 20 tools and then

00:50:32.300 | because we've tried that with aws no the similar things when you bind more i think more than 10 tools

00:50:41.020 | it's going to sure it is always you know a balance but i'm again the models are much better like try

00:50:47.180 | using cloud force on it is it hallucinating as much like these newer models are much better for

00:50:51.580 | understanding the concept and understanding what tools when the older models sure they get confused

00:50:56.620 | there's so many things but i'm very confident on these newer models they can understand your use

00:51:00.860 | case and what tools available and figure out the best way to solve the problem so then with this framework

00:51:05.340 | there wouldn't be a way for you to orchestrate a customized flow but more like you give the

00:51:12.620 | control to the agent you could if you want to have like specific like do this specific way uh there are

00:51:19.020 | different ways in strands uh with something called workflow mode where you actually say uh you know this

00:51:26.060 | is the workflow i want to do research results analyze things write the final report if you have to do

00:51:31.260 | something very sequential a strands has that i won't have time to go through all the different you know

00:51:36.540 | ways to do multi-agent collaboration and whatnot but this for that specifically like i wanted to do x y

00:51:41.900 | z first the workflow away can do that so yes then is it possible say um i i don't have a predefined

00:51:50.620 | workflow but i know it needs to figure out the right workflow then then that's what i just did there

00:51:56.140 | you know i just gave it a sentence and figured it out i see yes okay perfect thank you

00:51:59.580 | so cloud four has something called interleave thinking i believe that's what it's called where

00:52:08.140 | it can handle multiple tools processing much better than most models today so if you're passing in 20

00:52:14.860 | tools it's able to work through the agentic loop to really figure out which tool to run and it's also

00:52:21.340 | able to run parallel tool calls so rather than just say okay here's the objective let me run this tool it

00:52:28.860 | can say here's the objective let me run this tool this tool this tool this tool and this tool and then

00:52:33.340 | process the results and determine when needs to happen next so i would try a cloud for which he like

00:52:39.660 | banjo mentioned then last example really quick uh again you know strands i made my nova act mcp server

00:52:48.220 | and it can actually run that you know i had to find this is the mcp server use the nova act mcp you know

00:52:53.900 | use the cloud so same type of thing i could have another agent you know use uh nova act as well

00:52:59.500 | so strands make it very easy to build these agentic workflows uh so i really enjoyed the developer

00:53:06.220 | experience of using strands and you know i already have the mcp server we see the same exact example

00:53:11.500 | before so once you have the mcp server it's very easy to plug in into different uh architectures and

00:53:16.860 | strands makes it very easy to accept that but yeah those were the three modules really about how to use

00:53:25.980 | strands uh mcp then amazon nova act again strands is open source you can download it pip install strands

00:53:35.260 | if you just type strandagents.com it'll take you to the documentation uh again also nova act

00:53:41.340 | nova amazon.com it's free to log in and then i think that's all the time we have but we do have a

00:53:50.300 | survey uh and you can get aws credit code by filling out this survey so banjo i have a question about that

00:53:57.100 | workflow thing in strands when you create these individual agents can you define which tools are

00:54:03.660 | passed on to each agents yeah yeah it's a great question dark about different agents uh running out

00:54:08.220 | of time but i'll quickly show uh i have a multi-agent example i believe

00:54:13.340 | oh i think you had it in the docs yeah yeah it's in the docs yeah yeah yeah each of these is a different

00:54:21.980 | agent so you know this is an agent you can have a different system prompt you have different tools so

00:54:26.700 | you're just defining the agent and then yeah you can have different tools different whatever there

00:54:30.540 | different models and then the workflow would just call that so yes completely customizable so that's

00:54:35.420 | the good thing about strand it's very easy to customize and build uh scalable solutions like that

00:54:39.580 | thank you and then again uh here's the survey you can get aws credits for filling out this thing

00:54:45.820 | tell us how we did what you liked what you want to learn more and now go build

00:55:03.260 | yeah any other questions i think we have a minute

00:55:12.860 | thanks for the presentation um so as these systems develop i think that it's reasonable to assume that

00:55:21.900 | they would emerge as an increasingly effective vehicle for committing fraud online at scale which would push

00:55:29.740 | businesses to implement more things like captcha which kind of decreases the surface area that

00:55:35.580 | tools like this would be applicable so what is the long-term strategy for that well you already saw we

00:55:41.660 | failed with captcha today like you know we're not trying to back capture we're not trying to break things

00:55:45.820 | you know a responsible ai is very important to amazon so no we're not trying to let this tool commit fraud

00:55:51.100 | you know you have to have an api key so it could be monitored so use cases like that will be shut down

00:55:56.220 | we're done

00:56:00.540 | i think we're done yeah so thank you all

00:56:05.180 | i think it's finished

00:56:09.260 | oh we can keep going we have more time oh the clock the clock ran out so i thought we were kicked out

00:56:19.980 | all right well more questions then i guess i thought yeah another question

00:56:26.940 | so regarding nova act let's say that i have a headless browser in the cloud is there a way to connect

00:56:42.540 | nova act to my custom browser instance in the cloud yeah yeah yeah you can there's a way to like put your own

00:56:48.140 | browser instance so yeah novak supports that so oh pretty possible yeah thanks

00:57:00.700 | all right let me go to novak github page

00:57:07.100 | and it's just some examples there

00:57:17.180 | i think right because it says start at one and then you have 120 minutes

00:57:20.700 | i know i just told him okay but i think what happened that time

00:57:23.340 | oh yeah no you guys can keep rocking it okay yeah

00:57:30.060 | so yeah there's a way to set up your own user agent for nova act so definitely possible

00:57:44.220 | there's a lot of time so i don't know if anyone actually got into the workshop so we can still

00:57:58.780 | build some stuff or i can try some other examples

00:58:11.260 | that's a lot of time so we can try to make a streamlit app with nova act so we can try that one

00:58:16.300 | so we can try to make a streamlit app so we can try to make a streamlit app with nova act so we can try if that works

00:58:21.340 | so we can try to make a streamlit app with nova act so we can try to make a streamlit app with nova act so we can try if that works

00:58:29.420 | oops oops

00:58:44.460 | oops

00:58:46.460 | oops

00:59:02.380 | so one example i tried i tried to make a streamlit app that uh look for like the top five uh playstation games on game faqs and then create an image like a nice graph for me

00:59:16.460 | but it can't fail so uh i think that's one of the issues there i think it failed at one of the steps

00:59:22.140 | there uh let's see oh that nova act got an error so it couldn't navigate gamefaqs.com so it does it does

00:59:33.660 | fail at some of the things so that's you know again research preview you have to be more specific on how it goes

00:59:39.020 | through things uh oh yeah let me show you where the code is just so you can have an example let me pull up

00:59:45.500 | the code

00:59:45.980 | yeah let me try let me set up my local machine so we can see how it works

00:59:55.500 | oh yeah go for it does nova act depend on like uh semantic html and like good web design to actually

01:00:12.700 | work i mean it understands the actual page so it can click through those things but if the if the page

01:00:17.900 | like doesn't have like a search box or button and not be able to navigate so as long as the page it can see

01:00:21.900 | the page it can see the page understand where to click and then click those correct buttons so

01:00:25.740 | i get maybe a follow-up is there any like efforts to do like experimental like engagement on the page

01:00:33.100 | so if it comes on a page that it's not familiar with maybe it would try and act like a human would

01:00:38.380 | to like click on things or try things out depending what you you put in that prompt because again you're

01:00:44.060 | creating that workflow what it should do so if you say you know explore this website and find thing

01:00:49.020 | they will try to click through that but again it's up to kind of what that initial prompt is that you

01:00:53.820 | have for it yeah when you're using over act you're kind of giving it step-by-step instructions when you're

01:00:59.260 | using the sdk so that way if you kind of know it's an obscure website you can give it those instructions

01:01:06.540 | that it needs to perform rather than the mcp server is using natural language to infer what needs to be

01:01:13.660 | done so there's not specific instructions coming from you unless you provide it

01:01:27.100 | yeah so i'm going to run it locally on my machine just to show an example let's see

01:01:32.140 | oh let me hide my key for a second because it's been recorded

01:01:43.820 | all right

01:01:54.860 | pipe on get coffee thank you for coming

01:01:59.900 | all right so i'm just running it locally on my machine so without headless mode so you can see

01:02:07.740 | it opens up the browser

01:02:08.860 | it's able to type copy maker

01:02:17.180 | so what we're looking at now is not in headless mode this is actually nova act actually

01:02:24.620 | performing the task in the browser so yeah a lot of questions about how does it work you know

01:02:30.220 | and we can try more complicated examples i just wanted to show it can work on your machine

01:02:34.700 | and you can see the log you know i'm just looking for and if i like change the page while just doing

01:02:43.420 | something it's going to like mess up so i'm going to click the page and see what it does like so someone

01:02:47.580 | asked about click things of that nature what's it going to do now

01:02:54.380 | so see it crashed now because i brought i changed the different page didn't know what to do so

01:03:03.740 | examples you can interact when it's when it's going through the motion as well and then

01:03:09.580 | i believe i have an uh consider the mcp server i set up a clod instance

01:03:20.140 | oops

01:03:24.940 | and then i have a my nova act mcp servers there so i'm able to actually you know i click this you can see

01:03:32.060 | all the tools it has available so i can ask it to like navigate a website so

01:03:36.860 | anyone having a complex example you can see the mcp server so i know some people have been asking some

01:03:45.260 | complex examples so go ahead and give me one yeah you got you got one one one one question i had is uh

01:03:53.820 | can nova support like drag and drop functionality you can try it do you have a specific website that

01:04:01.820 | that has like drag and drop

01:04:16.220 | draw io uh let's go to draw.io and make a cool diagram use nova act

01:04:27.900 | i want to see what happens

01:04:32.700 | all right so let's go to draw io all right it opened the page

01:04:45.660 | do i have to accept something nope it's going oops all right open dry let's see

01:04:50.380 | and then i'll make this smaller

01:04:56.700 | wait for page to load look at my initial setup for template selections all right it's going

01:05:08.860 | oh it crashed what happened oh do i have to allow allow always i took a screenshot

01:05:18.460 | i need to continue the browser session to see what's available let's look at the screenshot all right it's

01:05:31.180 | it's opening up again uh it's going to draw io

01:05:36.300 | yeah if i keep clicking away it clicks back to the die the browser session so i need like two monitors

01:05:57.180 | let's see let's see let's see is it going to figure out how to use draw io

01:06:00.620 | wait for page take screenshot look for template options come a blank paper all right it's

01:06:08.860 | so it's kind of i didn't give it any specific instructions i just said make something cool so

01:06:14.060 | maybe that's too hard to interpret for this website maybe i have to say click this click the square button and

01:06:19.820 | then drag the square to the center or something i might have to be more explicit for that

01:06:24.220 | so it seems it seems to have frozen all right it's clicking something all right click new

01:06:32.540 | oh okay it's doing stuff

01:06:43.420 | again it's not like super real time it's going it's not like instantaneously but it's it is clicking

01:06:48.540 | through the buttons clicking through stuff all right did it do anything oh the claw

01:06:56.460 | so it looks like it failed so yeah it looks like clod failed that one so i won't blame no for that but

01:07:03.420 | that's like that's idea so thanks for trying to do something hard

01:07:08.540 | okay another question back there oh yeah can we try another one yeah let's try another one sure can

01:07:14.700 | we do uh you know on google maps find the top three rated coffee shops with within a mile radius of

01:07:22.380 | this hotel top three coffee shops shops near the marriott marquis in san francisco

01:07:34.540 | you'll figure it out

01:07:41.980 | all right open maps google search mirror marquis san francisco wait for results to load so it has a

01:07:50.940 | plan it's going to do something so let's see it opened google maps

01:08:01.340 | all right let's have mirror marquis san francisco so it's able to type that

01:08:04.460 | okay it searched it found the mirror marquis

01:08:13.820 | so there's a copy button let's see if it clicks that i'm curious

01:08:23.420 | looks like it's frozen give it a couple more seconds

01:08:40.540 | whether to click it got this 15 minutes i was trying to type in that box okay

01:08:51.980 | all right

01:09:00.380 | all right it's typing coffee shops all right all right it's going

01:09:05.900 | all right so all right it'll open the coffee shops and let's see if we can get those top three there's a

01:09:18.780 | four eight four seven another four seven let's see if it can get that

01:09:31.100 | did it crash

01:09:39.660 | i think it did it but i think

01:09:44.300 | i'm gonna blame claude claude desktop might need a different mcp client

01:09:48.620 | yeah

01:09:52.380 | i think yeah i think claude desktop doesn't like doing that but again because it's an mcp server i can open

01:10:01.260 | up a different mcp client so i can open like cursor for example and ask it questions through that

01:10:08.300 | cursor

01:10:15.980 | let me go this

01:10:18.300 | and then

01:10:20.620 | you see it has the mcp tools oops it has this up let me just open up a new one

01:10:27.340 | i can do the same thing and use nova act

01:10:37.580 | and then it's called the mcp tool again so that's the beauty of mcp i already have this server i can

01:10:46.780 | just use a different client it can understand all the information it needs to and do the exact same command

01:10:51.500 | so it's going to do the same thing cursor might be smarter than claude codes

01:10:56.140 | but yeah it's able to do the exact same type of thing so

01:11:00.380 | another question over here

01:11:04.780 | yeah i just got a question on the

01:11:07.100 | the nova act model yeah

01:11:08.620 | that model is that that is that running in the cloud

01:11:11.980 | yes so nova act question was where is nova act running and yes it's running in the cloud

01:11:16.140 | so yeah it's just you get that api key and it's doing the call behind the scenes in the aws cloud

01:11:21.100 | so then what what does it upload to the cloud

01:11:24.060 | well it's asking the questions and like you know go to google maps and then they would say i understand

01:11:29.260 | that and it's actually clicking those buttons and doing the actions so the actual uh intent of

01:11:34.940 | what you're trying to do in the specific action

01:11:39.100 | and then if i was if i was using it locally you couldn't use nova act locally it has to be uh connected

01:11:47.500 | to the internet to use it okay but if i for example though if i i wanted it to like look at my gmail

01:11:55.980 | oh yes

01:12:01.100 | ah yeah i see what you're saying yeah yeah it's you it's i mean it is you know it's an api endpoint

01:12:06.220 | it's been past the aws so you know only passing information that you feel like it's not going to be

01:12:10.620 | without training the data or taking any of that nature but it's going to the aws cloud and processing

01:12:15.420 | you know what to click on this button locally on your like browser

01:12:24.460 | so looks like it's not yeah see now it's even certain the rating it actually knows which rating

01:12:29.100 | to press so

01:12:31.740 | right yeah yeah well nova act is executing like in this mcp server example i say you know

01:12:43.020 | find the top three coffee shops in marriott near the marriott marquee and then i'm passing that

01:12:47.980 | information that the the llm is understanding that plan and then it uses nova act to interact with the

01:12:53.740 | browser because like cursor or cloud code or amazon queue they can't interact with

01:12:58.700 | the specific uh you know website by itself it uses it uses nova act to do that right but like given a

01:13:05.020 | question though like how how does it come with uh come up with the plan oh the mcp server like that the

01:13:11.340 | client so i picked the model in the example we had the mcp client we had as we showed the model

01:13:17.020 | right like the cloud 3.5 yeah that's coming up with the plan same thing here you know i asked you know

01:13:22.460 | help me find the top three coffee shop near the marriott marquee this the model that uh cursor is using

01:13:28.620 | is coming up with that plan and then i'm using the nova act mcp server to act on it exactly so this is

01:13:34.460 | the plan search for marriott marquee click the marriott marquee you know search for the things

01:13:38.300 | and you see all this information nova act return and actually it will return this time so i think

01:13:43.980 | the problem was with claude desktop but it got the three top three copy stops there right what are all

01:13:49.420 | the tools that uh nova act can do today uh so the mcp server is what i wrote so uh but the idea between

01:13:56.860 | nova act again it can interface with the web browser that that's the tool the browser is the tool and it can

01:14:01.740 | anything on the website you can actually click through go through the example etc i see

01:14:05.980 | you got the repo you got an architecture that showed the mcp just so they can see it yeah so i mentioned

01:14:15.820 | uh there's an official aws mcp servers so uh this aws labs mcp and a lot of different um mcp servers here

01:14:25.340 | for the one the nova act one i created my own one uh go back to the nova act examples or where do the

01:14:32.700 | ah here when i use amazon q to explain you know the mcp server for like what what's going on what tool

01:14:41.180 | was the browser session performing an action on the browser so this is a good uh thing to talk about so

01:14:47.580 | can you dive deeper on the browser action function and we can see because this is how it's actually

01:14:57.180 | acting so uh amazon cube browser action is designed to perform actions it has this uh what's cool about

01:15:07.340 | it it just does oops it's going to the code it performs a single action in the nova act browser

01:15:15.260 | so it's executing that action it stores this act.act is like what nova says you know click the search

01:15:21.740 | bar do this x y you know my the mcp client understands how to use this act.act it passes

01:15:29.260 | the correct action so we saw the example here one of the actions was like go to google maps or click

01:15:35.820 | this button or do that search that's how it's able you know these actions and then the nova act mcp server

01:15:41.980 | is translating that to actually click that button so the mcp server provides all the interfaces it

01:15:47.820 | necessarily needs so then these mcp clients can interact and do actions and do things yeah and nova act

01:15:54.620 | is just the model in the background that's able to click those buttons extending this question it so your mcp

01:16:03.340 | server so claude uh or cursor running locally right it's calling your mcp server that's also running

01:16:10.380 | locally is your mcp and your mcp server is the one that spun up the i guess the chromium instance yeah

01:16:16.220 | is it is your mcp server taking screenshots of what you see in chromium and shipping them to nova to nova act

01:16:23.900 | the screenshots are locally and then based on that like you can see it's actually getting all the

01:16:27.900 | information uh the final page information so it's not storing your screenshot data and sending that

01:16:33.100 | everything that it's running locally it's clicking those buttons based on what's on the browser sensor

01:16:38.140 | got it but is is any of any of the information in chromium does that any of that need to be sent into

01:16:42.860 | any no no everything running yes running locally okay distinction okay perfect thank you

01:16:56.220 | and let me open up the

01:17:09.900 | where was that looking so one of the things about making mcp servers is you have to provide a lot

01:17:18.540 | of context so uh for nova act like i say you know when writing active for nova action be descriptive of

01:17:24.860 | what to do you know click the hamburger menu icon go to order history don't find my order so the more you

01:17:30.620 | know uh concise and just prescriptive of what you want to do it's better you know search for hotels in

01:17:36.140 | houston so by average customer like so the better specific it is uh that's how the mcp's uh client

01:17:42.700 | is able to make those great requests and find the information so type copy maker search rock enter so

01:17:48.700 | the more prescriptive you are of nova act the better results you're going to be and i encoded that all into

01:17:53.260 | this uh mcp server so the clients can leverage that so i think that's probably one of the hardest things

01:18:00.540 | about making the mcp servers that's making sure you provide the next context of when to use the tool how to use the

01:18:05.980 | tool the inputs and outputs but once you solve all that it's very easy to plug and play the different mcp

01:18:11.740 | clients like we've done here

01:18:13.820 | so

01:18:39.500 | question yeah

01:18:52.140 | right so when nova act is doing something it's passing back the log of everything it's doing so you

01:18:57.500 | know what what steps it did so the starting page the add the results the action result id so it's keeping a

01:19:03.420 | of log of everything it did uh power so it's able to get that json i understand what the id what the

01:19:08.860 | result is so you can see what it's doing so it can move on to the next step yep

01:19:21.180 | other question

01:19:27.900 | sorry a quick question is this able to do uh like uh automated ui testing because of this

01:19:35.100 | well with nova act you know you can define like what you want it to do so you're going to have to define

01:19:40.940 | you know go to this button click this does this work so you can define that workflow so i mentioned

01:19:46.460 | before like back in the day like if i'm writing selenium code i have to go click this h1 tag do

01:19:51.260 | this like now you can just write a natural language you know click this button click that button so yes

01:19:55.580 | it can handle that use case uh specifically of like opening the browser or checking these things and

01:20:00.300 | but you have to like you know this nova act search for coffee maker you know you have you specifically have

01:20:05.580 | to write what buttons to press yeah thank you

01:20:12.860 | let's see i guess we have time i can show some multi-agent collaboration with strands that could be

01:20:25.020 | something cool uh i think i have a repo for that so should be uh go to the aws labs page where's that

01:20:37.900 | let's work

01:20:39.980 | and then claude

01:20:45.500 | cool

01:20:49.340 | okay i'm just gonna copy this code and put it into our environment

01:21:14.700 | so in this example i'm actually going to show how strands as multi-agent collaboration so one way

01:21:33.580 | i'm actually going to create a powerpoint presentation based on uh you know a cloud migration request i want

01:21:40.220 | to like move my uh infrastructure on premise to the cloud give me a presentation of how i would do that

01:21:46.620 | and so for this i created three different agents i created a cost analysis agent so i have a system

01:21:52.540 | prompt there a solutions architect agent does a map out what you're going to be doing and then each of

01:21:58.780 | these uh tools is an actual agent so this uh cost analysis has the docs mcp server the cost analysis mcp

01:22:06.940 | server it has its own prompt the presentation agent has its own system prompt it has a tool from the

01:22:13.100 | a powerpoint mcp server that i'm using and there's an architecture agent it also has you know its own

01:22:19.180 | specific tools system prompt etc so different agents for different uh things in the workflow and then i have

01:22:27.900 | this orchestrator agent welcome the migration orchestration agent it has a prompt and i tell

01:22:32.540 | what tools it has access to and then the cool thing with strands is i make this orchestrator agent and

01:22:38.620 | then the tools are just other agents in that so it knows when to call this agent for this particular tool

01:22:44.460 | when to do that and i say you know i want to migrate my work my uh workload so write the fight tools to find

01:22:52.940 | that so i made a fictional company called shop easy e-commerce they have on-premise java my sql database

01:23:00.140 | yeah i want zero down from migration like all this all these little constraints in there and i wanted to

01:23:06.060 | make a migration plan and a powerpoint presentation that i can present to my executives of how this would

01:23:11.580 | work and i just designed and i'm the orchestrator agent will find out what to do i don't specifically say

01:23:18.060 | do this one first do that first we'll let the the agent figure that out so let me run that strands

01:23:26.540 | and it should be multi-agent

01:23:33.020 | all right so cloud progression agent as tools all right again so all the mcp servers running locally

01:23:42.780 | it downloads it's using the ux it starts with the architecture design first generates a diagram

01:23:54.940 | i'm going to use waft so take some time it might fail but it will just update update itself

01:24:00.700 | making another judgment

01:24:03.420 | all right i think it couldn't generate the diagram there but it's saying all right i'm just gonna this

01:24:17.580 | this is what the diagram should have this is what we're going to doing

01:24:24.140 | now i'm just going to do a cost analysis cost analysis on basically the things we did there so

01:24:31.500 | it's it's a this this workflow takes me a couple minutes to run but you can see it's calling all these

01:24:36.540 | agents uh different things it's understanding what to do what actions to take first it's finding pricing for

01:24:42.780 | eks because it has a cost analysis tool and knows where to find that information so it has the up-to-date

01:24:49.340 | pricing all the time funding for aurora for its database so it's able to understand all that information

01:24:55.100 | and get the real-time up-to-date information just because we have that uh pricing mcp server from the aws labs

01:25:02.460 | it's called cost analysis yeah cost analysis mcp server documentation all the stuff you need for

01:25:15.740 | finding the right price on aws it has all that information and the agent was able to just use that

01:25:21.420 | one that's going to generate report so it's still running again this does take a while because i'm

01:25:32.460 | asking a very complex question a lot of things going so it does take a couple minutes to run through all

01:25:37.660 | that it gets its monthly spend predictions monthly savings etc so they would understand all the information

01:25:45.260 | and get all up-to-date information based on the plan we provided and the last thing now wants to create

01:25:53.020 | an executive presentation so download the powerpoint mcp server and now it's going to make a powerpoint

01:25:57.980 | presentation based on that so adding the title slide so you know add a placeholder so generating powerpoint is a

01:26:08.700 | very popular use case and there's an mcp server that can go ahead and just do that add bullet points

01:26:13.980 | etc so give it a couple another minute or two