back to indexBuilding Agents with Amazon Nova Act and MCP - Du'An Lightfoot, Amazon (Full Workshop)

00:00:00.000 |
building agents with Amazon Nova Act and MCP I'm excited today because we're going to build 00:00:25.680 |
intelligent autonomous AI systems that can help you build scale and improve your applications 00:00:35.440 |
and business my name is the one Lightfoot and I'm joined by my name is the one Lightfoot and I'm 00:00:54.960 |
joined by hey I'm Banjo Biyami I'm a solutions architect here at AWS now this is the AI engineer 00:01:03.480 |
welfare and I've been in tech over 15 years and right now is the most exciting time for me in my 00:01:13.560 |
entire career and one of the reasons for this excitement is agents how many of you right now 00:01:20.640 |
are building agentic systems I love it so when we talk about agentic AI I think it's important that we 00:01:31.800 |
level set from an AWS perspective there are three key terms we need to think about first the ability to 00:01:41.160 |
plan agent gets a prompt it gets an objective and it determines the actions that need to be taken so 00:01:49.680 |
creates the plan and then it takes actions on those actions by using things like tools now the last piece 00:01:58.080 |
the third piece the third piece and probably the most interesting is the reasoning where the agent is able to evaluate 00:02:06.600 |
the results and determine if it needs to update the plan and take additional actions until the objective is complete this is an agent 00:02:16.860 |
now when we actually break down the architecture I think it's important to take a look at this because we have the user input we have the agentic system we have the possibility of some type of human in the loop and then we have the generator response 00:02:34.860 |
response now when we dive a little deeper there are some components of this agentic system we have the LLM we have a knowledge base with external information that we may want to provide we have guardrails to say to the model don't do this or to ground the model with the truth from our knowledge base to say okay is this actually relevant information is this accurate to the information we're receiving from the knowledge base 00:03:02.860 |
and then we have access to additional tools memory or we may need to talk to additional agents or LLMs like Amazon over at do something like MCP and we have the ability to design our own flows for these systems 00:03:18.860 |
now the most interesting piece that I think a lot of us are probably focused on when we're building these systems is around the continuous evaluation framework like how do we know if we're using the 00:03:30.860 |
right LLM how do we know if our prompt is consistent accurate or even optimized for the performance we're expecting and then how do we even judge our system how do we rate that and determine that that it's actually solving the problems that we need or intend 00:03:46.860 |
now once we have this we need to log this information and then have some type of subject matter expert and determine how can we improve this system and this is the iterative approach so we're always trying to improve 00:03:58.860 |
and optimize our agent system now continuing on this continuing on with this story now there are some use cases that we should be building these systems for like if it's complex tasks and we don't know which tool should be used how many tools should be used and we want the model to leverage his reasoning capabilities well this is a great use case for agent system but if it's something that is just one step 00:04:26.860 |
our traditional if this then that approach is probably the best solution right we don't always need to provide some type of agency system for something that can be done with a traditional solution 00:04:39.860 |
now when we talk about agents on AWS there are three approaches and perspectives we should think about 00:04:46.860 |
first it's going to be the specialized using something like Amazon Q how many of you are have used Amazon Q the there's Amazon Q in the console to help solve your problems on AWS in the console 00:04:59.860 |
there's Amazon Q developer inside of your IDE and right now one that I'm I would think most excited about is Amazon Q CLI agent how many of you have used that 00:05:08.860 |
for me if you if you are into increasing your productivity using a CI CLI agent has helped me tremendously from editing the video it can do that summarizing the document reading my entire code base like today for one of my demos I had some code and I was trying to figure out why wasn't it working I said analyze this code and tell me what you see let me know what you see 00:05:36.860 |
let me know the API's that it's calling well I looked at the API's well it didn't match my API's in the API gateway so when the code was deployed it wasn't deployed with the right API's so the agent was able to help me save a ton of time by just analyzing the code and tell me what it saw because I never seen the code before right so that's what these tools are able to help us do the next is fully managed if you're using Amazon bedrock you're able to leverage Amazon bedrock agents to build 00:06:04.860 |
build and manage agents inside AWS and today what we're going to be focused on is the DIY to do-it-yourself approach by using strands agents this is allows you to not just leverage Amazon bedrock but also leverage models through other providers using light LLM 00:06:24.860 |
now when we talk about strands agents strands agent was announced about a month ago I want to say something about a month ago this is open source extremely lightweight 00:06:34.860 |
so if you use other agents flat frameworks is like that but the implementation is you'll see in the code how easy it is to build an agentic system or agent itself in a few lines of code and already get started I built a multi-agent solution in about under 50 lines of code 00:06:52.860 |
and so when we break down strands agents there are three components we have a prompt we have a LLM and we have tools so you create a function called let's say a get weather tool right you define your agent you give it a prompt and it's already implemented and you'll see in the code as 00:07:14.860 |
as banjo goes through here in a moment now taking it a step further as danielle presented today on amazon nova act 00:07:24.860 |
these models are able to do some really cool things and this is another thing that i'm excited about amazon nova act is a research preview model and the capabilities of this allows you to use a prompt or give instructions and take complex tasks and do the 00:07:42.860 |
and take complex tasks and do things like browse the internet to find research or to research or to search on amazon.com to find the top list of widgets 00:07:52.860 |
right and then return them and then add them to your card so you'll see how we can leverage this not just using the sdk for amazon nova act but also by leveraging mcp which leads us into the last piece which i think when we're talking about agents i 00:08:10.860 |
i don't think we will be here today as fast as we have moved if it wasn't for mcp how many of you are 00:08:17.900 |
leveraging mcp today model context protocol how many of you have built your own mcp servers 00:08:23.580 |
i built several um i got two that i use all the time one how many of you use obsidian 00:08:33.580 |
okay so for my documentation i built obsidian mcp server this allows me to save all my documents 00:08:41.500 |
reference all my documents and just my entire workflow is streamlined because of this mcp server i use right 00:08:48.780 |
there but i also use one for my bookmarks i built a bookmark manager because every friday i'm restarting my 00:08:54.620 |
i'm restarting my computer and i lose my bookmarks i save them and i forget about them but now i can just 00:08:59.500 |
say save this bookmark it gives it a description gives it a title give it a date and i can even add 00:09:04.380 |
notes so i can remember where this bookmark so now when i open up qcli i can say hey i'm looking on the 00:09:10.140 |
topic i'm looking for um some information on mcp can you tell me all the bookmarks that i have then it'll 00:09:15.260 |
find it can you tell me the ones i saved last week and so these this is the power that we have today but 00:09:20.780 |
but with that being said i think it's time that we all start building banjo is going to take over but 00:09:27.500 |
if you you open your laptops and log on to this link this is going to take you to a workshop environment 00:09:34.060 |
where you have access to an amazon account where banjo is going to walk you through building out 00:09:39.740 |
today's workshop i thank you for your time cool all right so uh this is going to be a hands-on workshop so 00:09:46.540 |
we've provisioned an aws account for everybody here so you don't have to install anything on your 00:09:51.180 |
computer everything's going to be done through the browser and i always say the hardest part of the 00:09:56.220 |
workshop is just getting started so some of my colleagues are also here so raise your hand aws 00:10:00.220 |
folks that are here to support so we're going to take some time to just get logged into an environment 00:10:05.340 |
we're going to set up a vs code server enable models get the nova act api key so again this is the 00:10:11.340 |
hardest part of the workshop that's getting started so let's take some time to just get into the environment 00:10:16.620 |
and i'll follow along as well so and this is uh again everything is you don't have to install 00:10:23.020 |
anything on your computer you don't have to use your own aws account everything is provisioned for 00:10:26.380 |
you so but while that's loading i'm going to briefly walk through the three modules of the workshop 00:10:30.540 |
so the workshop is really about how you can use nova act so the first module is just getting started 00:10:36.140 |
with nova act we're going to make an api call that the second part of the module is going to make an mcp 00:10:41.420 |
server that can leverage an nova act and finally we're going to use the strands agent to cook 00:10:45.900 |
everything together so that's kind of the the three steps we'll go through at this workshop and all the 00:10:50.380 |
code is available via the link on github so you can try it on your own as well uh but yeah trying to get 00:10:56.940 |
started here if you can't follow along i'm going to be doing up here so don't worry too much and again 00:11:01.740 |
all the code is available so you can try it out line offline okay so the first things first uh if you're 00:11:09.820 |
following along make sure to click this open aws console button again we provision the aws account 00:11:15.180 |
you know don't log into your own aws account don't try to create a new one everything is uh pre-visioned 00:11:20.460 |
here already so i'm gonna click clicking that button to open up your aws account so logged into my aws account 00:11:34.940 |
so the first thing we do in the aws account we're gonna enable uh amazon bedrock models so 00:11:39.740 |
amazon bedrock think of as a serverless api to access different foundation models and you can build lots 00:11:45.340 |
of agenda for applications in it so it has capabilities like knowledge bases guardrails you can build agents 00:11:51.580 |
on top of it for anything you need to build uh ai agents or agenda applications amazon bedrock has 00:11:57.740 |
capabilities for that but for this workshop we're just gonna enable specific models so gonna enable 00:12:04.860 |
specific models you can click the amazon models and then we'll use the cloud 37 3.5 iq and 3.5 sonnet 00:12:13.020 |
so those are the ones we're going to use for this workshop 00:12:21.020 |
and again all the instructions are also in this workshop as well so we could follow along but i'm 00:12:31.820 |
just going to go through it just for sake of time and then the next part once we get the model access 00:12:37.500 |
there's a vs code server that has everything set up already so i'm just going to go in there 00:12:47.100 |
and if the url and password is there you can log into your vs code server with everything installed 00:13:04.220 |
i'm also going to log into amazon queue so amazon queues are id extension to help you write code 00:13:14.860 |
we have time you can sign up through a builder id completely free you don't need uh aw's account 00:13:19.740 |
you don't need to put in your credit card you can just log in through there i already have an account 00:13:24.220 |
so just feed it up but it puts a nice little ai agent there you can ask questions update code etc so 00:13:31.980 |
it's uh i'll show you some examples i'll just go through some of the code so 00:13:36.300 |
so who's gotten to this point setting up all the models workshops because this is once you get all this 00:13:44.300 |
done then that's when the real fun begins so that's getting a pulse if i need to slow down or 00:13:49.340 |
slow down a bit okay i'll wait a bit again raise your hand if you're stuck anywhere questions 00:13:54.860 |
we have uh agents that can come around and support you so i'm going to pause for a little bit 00:14:09.660 |
oh yeah so this workshop uh again all the code is available online uh this workshop available as 00:14:15.180 |
well so you can also look through that there's a website called workshops.aws 00:14:22.860 |
and when you go there you can do something like uh nova act and then it's the only work that that shows 00:14:29.420 |
up so you can always go to workshop.aws that search nova act and this workshop was so up so you can see 00:14:35.180 |
see all the instructions all the code and run this uh on your own 00:14:52.300 |
okay and then last thing uh because we're going to use nova act we actually need to get a nova act api 00:15:00.460 |
key so if you go to nova amazon.com it this is a website that you can use the amazon nova model so 00:15:07.900 |
you can do like chatting generating images uh speaking with nova uh generate videos but then also 00:15:14.140 |
this is where the act api key is generated so if you're following along and you want to generate your 00:15:20.700 |
key again it's free to log in you can use your amazon.com uh like when you order something on 00:15:26.620 |
amazon.com account to log into this and then you can just generate a key here and they'll be able to 00:15:49.420 |
oops oops okay so i'm going to walk through what module one is uh before again has anybody got 00:15:56.380 |
in here just quick pulse check if not you know i'll continue i know the wi-fi is slow so it might be 00:16:01.580 |
hard so i'll just continue on uh but yeah the first one we're going to see how nova act works uh how 00:16:08.700 |
how the actual code looks like uh generated the key i need to export the key and then kind of running 00:16:14.780 |
the first script which is actually going to open amazon.com 00:16:18.140 |
uh and we're actually going to look for the first coffee maker so let me see how that code looks like 00:16:27.900 |
so very simple code uh with nova act it's again it's all in python sdk i so i decide what a page to go to 00:16:43.980 |
so go to amazon.com i say i want you to search for a coffee maker i say select the first result and i say 00:16:50.300 |
get the title of that product page so uh very simple if you've ever done kind of web automation before 00:16:55.740 |
of something like uh selenium or playwright you probably have to like look for this diff tag you 00:17:00.380 |
know look at this h1 tag grab this information a lot of manual processes of actually inspecting 00:17:06.060 |
the actual website here i'm just saying click the search bar find something like i don't have to 00:17:10.940 |
specify click this tag do that so it makes it much more easier to engage with the website as a natural 00:17:16.300 |
human would instead of like looking through divs and trying to find this p tag specifically so 00:17:21.900 |
uh this is a great way to just uh you know use nova act right out of the box so i'm going to uh run this 00:17:31.500 |
examples all right all right so added my key going to what happens when i run this file 00:18:21.100 |
i gotta run it with this let's start that over 00:18:36.700 |
so just explain why is banjo running that command it's running fxv fb it's a frame buffer where it runs your 00:18:47.660 |
nova act actually goes and clicks a mouse on a browser that's why it needs to be run like that 00:18:52.780 |
otherwise it has no gui so this is just kind of a way to emulate a graphical user interface on this 00:18:58.540 |
linux box thank you daco yeah since we're running everything in the the cloud on a browser i'm saying 00:19:04.060 |
you know open a browser again but it's already in a browser so that's why i crashed so i have to put 00:19:07.980 |
that a frame buffer command uh and yeah the workshop kind of walks through why we did that but you can 00:19:13.820 |
see uh what is going on when nova act says i'm going to search for a copy maker i'm at the amazon home 00:19:19.580 |
page my task is to search for this so it's understanding what it's doing i see the search 00:19:23.820 |
bark has copy maker i'm at the search spark here and now it actually puts the actual log of the actual 00:19:30.460 |
html file so it's taking screenshots you can see what it looks like it got the first results i'm on 00:19:35.900 |
the copy maker page it selected it and now i've got the title now it says you know what's the title of 00:19:41.580 |
this product page all right got this black decker 12 copy maker my task to return the title of the 00:19:51.900 |
product page product title it got that and it ended the session and then it also creates a a video log that 00:19:58.380 |
i can actually look at to see what it did for each for everything it did in this webm file so yeah question 00:20:06.540 |
yeah we'll repeat that it would be a microphone 00:20:19.180 |
so the question was does it reason about the page in terms of pixels or in terms of text 00:20:30.140 |
yeah so it's actually looking through the actual uh the page itself so you see in this video it sees 00:20:36.460 |
it looks at the page i can see what's in the page so it's it's a large language model train so it can 00:20:41.660 |
actually see the actual the page is doing so it's not looking at like like the h1 tag or whatnot and 00:20:47.100 |
understand the context of that particular page you can see that's a search box okay i'm going to go 00:20:51.900 |
click through that search box so yes it understands the pixel level of what's on that actual page 00:20:56.620 |
so this is kind of the video it's hard to see make it bigger sped up 00:21:06.380 |
so it opens the page it goes to the able to type in coffee maker there 00:21:10.700 |
um it gets that information clicks the button so even if all the ads and everything the video can 00:21:16.860 |
understand the task clicks that and it gets the information back so that's and that was a couple 00:21:21.900 |
lines of code so you can extrapolate to other type of workflows you can do for searching through things 00:21:27.420 |
sorry i have another question yeah so when you what i've experienced with these kind of frameworks is 00:21:33.580 |
that when you run this on a server environment um services like cloudflare will block the access and 00:21:39.740 |
maybe do a captcha challenge how do we solve that using q yeah so with uh so using amazon nova ad so it 00:21:46.620 |
doesn't do captures it doesn't do that nature so it's it's meant for like workflows you understand but 00:21:51.020 |
yes it's not going to bypass captures and other things of that nature as well so it's made for 00:21:55.420 |
like going to amazon.com or look through a booking site but if something that like requires like a human or 00:22:02.380 |
you you wouldn't use nova act for that use case if you need to pass a capture or something else that 00:22:10.860 |
use another technology this is not meant to like overtake humans you know it's more like i'm helping 00:22:15.420 |
them augment things but not if there's a capture involved and have to use a different technology for 00:22:19.580 |
that it's awesome a preview it's awesome a preview yes this is a research preview as well so if that's 00:22:25.900 |
a very good use case you know leave feedback on the nova the website so yeah is is human and loop 00:22:31.980 |
possible at all with it yet well this this one it's no because i'm writing all the code here so but 00:22:39.500 |
again this is python code so i could probably put in something here like you know ask something make 00:22:44.220 |
an api call here so this is you know it's a python code you might be able to create some type of uh 00:22:49.660 |
workflow that might augment like wait for a human response or whatnot because the browser is happening 00:22:55.180 |
in like headless mode but could you make it work with a browser to human is also seeing at the same time 00:22:59.820 |
yes yeah so it can pause and wait for somebody put in like a password credentials or do a capture 00:23:04.940 |
and then once it receives that works continue on the workflow you could do yeah because right now i 00:23:09.740 |
i ran it in headless mode but yes it can also run uh you know to open up the browser if i ran this on 00:23:14.380 |
my macbook i would open up a chrome browser and go through that session also if you're running it and 00:23:20.220 |
you wanted to bypass something that's two-factor if you're already logged into say amazon.com and then you 00:23:27.180 |
run a code it's going to use your credentials in that browser session to continue on to perform that task 00:23:33.180 |
so that's something that you can do as well cool so let me oops and then one other thing you can also 00:23:44.140 |
do multi uh you know parallel execution so my last my next example is actually i'm trying to find 00:23:49.900 |
multiple monitors and i want to compare them all at once so i'll show you how that code looks like 00:24:02.540 |
so i can check for the monitor extract information i'm setting you know i want i'm defining what i 00:24:09.900 |
want so again i'm you know saying i want to find the price the rating the size uh go to amazon.com 00:24:17.260 |
uh i set it headless mode this time so i don't need to do the frame buffer i start multiple threads 00:24:22.460 |
it looks for each monitor simultaneously because each of these are individual tasks so i can paralyze 00:24:27.340 |
them instead of waiting it to go through i define the list of monitors i want to go through start the 00:24:33.260 |
thread and then it starts executing and finds the results of the monitors so i can run that in the 00:24:38.700 |
background it was starting with three parallel threads and it's open so again running in headless mode so 00:24:49.500 |
it's going to be able to do this in the background where we can see kind of what the model is thinking 00:24:53.340 |
how it navigates through the web page yep all right i have tried to use nova in the past april and uh it worked for 00:25:03.900 |
the first time but once i did it again it triggered the capture is this something that has been already 00:25:10.780 |
resolved or is this happening because i think the website and it was amazon in this case it was detecting 00:25:16.780 |
it was a bot and uh is there like an llms.txt or robots.txt that can declare it so nova actor is a github repo 00:25:25.340 |
so you could go there and just grab that but it's it's working now like i'm running it you know i just i this is 00:25:30.940 |
this live code i'm doing right now like i just exported my api key started running it so 00:25:36.380 |
uh you can try it in the workshop i'll be yeah i mean it's ready to go we're building right now 00:25:41.660 |
and you can kind of see that it's going on in the background what it's doing 00:25:47.100 |
i've looked at this monitor the dell monitor i'm at the amazon home page it's like it's going through 00:25:52.700 |
looking through the search results it's saving things so you can see it's running in parallel it got the 00:25:57.100 |
information for the one of the first ones so it's going as i just set that up and it can execute 00:26:02.620 |
that so if you have some type of uh i don't know like daily news thing you need to go to the website 00:26:07.340 |
and get news or something and i have a report nova and there's no api for that this is one way you can 00:26:12.300 |
codify how to do that kind of search and get the information 00:26:17.500 |
so if you have a question yeah i'm wondering so how successful is this in terms of like more 00:26:22.380 |
ambiguous tasks because i i ran the amazon demo and that worked but i'm wondering could i just add google 00:26:27.020 |
there sure and and and how like how big and and sort of how much does it know when it's navigating 00:26:33.580 |
through like i was thinking like if i wanted to return a pair of sunglasses that broke would i would i be 00:26:39.820 |
able to just say like start in google and then find this company's website find a way to you know 00:26:45.340 |
engage support open a ticket like how much sort of like how vague can you be and how smart is it 00:26:52.300 |
currently would that would that yeah i mean the more instruction you give obviously better but 00:26:56.780 |
it's able to understand how to navigate a website that's what the model is trained on so if you say 00:27:00.620 |
you know go to this sunglasses website it doesn't it probably wasn't training a specific sunglass website 00:27:05.180 |
but it can understand that button support you know if this buzzes click a ticket so it understand kind 00:27:10.060 |
of the general knowledge of how to navigate the website but if there's something very intricate 00:27:14.140 |
about that website you're going to have to encode it in the text like make sure you click 00:27:17.740 |
button x first or whatever so it understands how to navigate websites got it and does it understand 00:27:23.500 |
when it's failed yeah sometimes sometimes i've seen it sometimes get stuck in a loop and like oh no 00:27:29.020 |
i keep scrolling i keep scrolling i keep scrolling it doesn't know when to stop so it again this isn't 00:27:33.180 |
research preview so things are getting better the model is getting updated behind the scenes but 00:27:37.420 |
it's not like it's not agi so that's got it and one last question um how is it in terms of navigating 00:27:44.540 |
like distrustful parts of the internet i mean there's a lot on the internet that we see and we know is not 00:27:49.420 |
to be trusted or it's something not to be followed how have you sort of worked around that problem yeah 00:27:54.620 |
because again it is a model in the background so it's going to understand like if you're doing something 00:27:58.700 |
it's not going to want to click that or might be there's safeguards in place so that's built into 00:28:03.820 |
the model but again uh it isn't research preview you still have to explicitly say what buttons to press 00:28:09.100 |
for certain actions but again the model it is an lm train it's going to be able to understand the nuances 00:28:14.780 |
and say if it can't take this action or can't do that that could happen but i haven't seen that use 00:28:19.580 |
case but if you keep pushing it maybe you'll find those those things well the thing i had in my mind is 00:28:23.580 |
like if you go to a site where you have to download a link sometimes there's an ad that says download 00:28:28.860 |
a link and you know that that's just an ad trying to get your attention of course would the model know 00:28:33.100 |
or is that some yeah if like for example like in the the amazon.com it shows an ad for something but 00:28:39.020 |
i said find the first thing was able to scroll past that ad and click something so the model understands 00:28:43.180 |
the task you give it so yes it can understand that thank you all right so this this is finished yeah 00:28:51.420 |
that's really quick it showed you got it was able to find all the models give me the size the rating 00:28:55.820 |
the price reach of the monitors so again it executed that on parallel it got me the nice information 00:29:01.420 |
and that that's kind of the idea of like it can do parallel execution in the background so you don't 00:29:05.740 |
have to wait for it and don't see it actually clicking through the the task and you get your information 00:29:10.060 |
all right one more question then we'll move on to the mcp part so nova is specifically meant to be used 00:29:18.060 |
with the browser correct uh so nova act so amazon nova is a family of models on amazon so if you go to 00:29:26.220 |
this website nova amazon.com you see there are different foundation models like nova pro premier 00:29:32.540 |
light micro these are like the text understanding models so like your typical llm calls there's also an 00:29:38.940 |
image model called nova canvas can generate images that the video real called nova real where can generate 00:29:45.100 |
videos and that it's also a speech model text speech to speech called nova sonic so nova is a foundation 00:29:52.140 |
of found uh foundation models by amazon to do all these type of tasks and act is just another one for 00:29:57.820 |
browser automation are there plans to expand this like beyond the browser so that we can someday take 00:30:04.780 |
actions in slack or ide or anything outside of the browser maybe some of the team is here so maybe talk with 00:30:13.020 |
them later thank you all right so i'm going to move on to the mcp part vanjo yep nova act is only available in 00:30:24.540 |
u.s yes right now nova act is only available in the u.s it's in preview so it's just getting started so 00:30:32.140 |
if you log in from like a different uh account like address like uk or something might not it won't work so 00:30:37.580 |
it only works in the u.s at the moment yes all right one more question over there and then i'm going to 00:30:45.260 |
move on if you live in three monitors um i got the same results as you did but i actually got a different 00:30:50.540 |
price with the samsung odyssey why do you think that might be oh 00:30:54.060 |
your amazon.com is different i don't know yeah because it is opening up a different browser so it could have 00:31:03.580 |
clicked something differently yeah so yeah that's right 00:31:07.980 |
we can actually look at the video premium or video playback to see what the results were like 00:31:14.140 |
one more okay one more quick quick one are there plans to support persisting browsing data such 00:31:23.980 |
as cookies in the cloud browser so right now it's opening up its own browser but you can also set like 00:31:30.060 |
your own like chromium profile and open up that browser so all the thing you have saved there like 00:31:34.940 |
you want to log into your stuff you can set your own custom browser but right before it opens up a new 00:31:39.420 |
like completely new browser without anything saved 00:31:41.900 |
all right so i want to show uh i actually made an mcp server for nova act so a module tool is going through uh mcp 00:31:53.660 |
and i can kind of show you what i did for the mcp server uh in fact we can use the amazon q here so i'm 00:31:59.740 |
going to ask it uh can you tell me about the nova act mcp server 00:32:14.620 |
can you tell me what it does what it does it does and oops 00:32:25.820 |
so you can see it's going through uh integrates nova act browser at mcp it has the browse session tool 00:32:34.060 |
browser action execute parallel tasks take screenshots close browser list results so i created these different 00:32:40.380 |
aspects of the mcp server so i could use something like claw desktop or cursor or amazon qcli to that 00:32:46.780 |
say you know open amazon.com and find information for me so it's it's portable it understands uh so i 00:32:53.020 |
don't have to actually write code i can say so go to amazon.com and find me the cop the first coffee maker 00:32:58.060 |
it'll actually write all that code i did in the initial one to do that or the multi-monitor so i wrote a 00:33:03.660 |
bunch of code to do this if i just said you know get me these three monitors to get the price it would 00:33:08.460 |
actually write all the nova act code it needs to do that using the mcp server so that's kind of the 00:33:13.100 |
power of mcp that i just describe a task and then i can encode the actual browse action things it needs 00:33:19.100 |
to so and then i also made an mcp client that can actually interpret that so oops it connects to the 00:33:28.140 |
mcp server it runs the code and is able to use query bedrock uh i am using a model so i'm using claude 3.5 00:33:35.660 |
sonnet here because it's an mcp client and needs to have an lm behind that and then it's able to 00:33:40.860 |
you know understand which tools to use uh run the code and open up the browser and whatnot so let me 00:33:55.100 |
so er open the file just did that we asked amazon q to explain the file to us and now we're actually 00:34:04.540 |
going to run it so python 3 and then i can open this up 00:34:23.820 |
okay so let's be adventurous if somebody give me a query to try since anyone has an idea i'm going 00:34:30.940 |
to just ask it and do something so someone give me an idea of what to run another act fix wi-fi 00:34:37.340 |
how would you fix the can you find a website to fix website can 00:34:46.780 |
fine fine let's see website to fix wi-fi use headless mode 00:34:59.900 |
all right it goes to google.com how to fix wi-fi problems troubleshooting guy in the box and press enter 00:35:12.540 |
return a list of the website title descriptions all right it's going through that so it open 00:35:19.100 |
google.com how to fix wi-fi problems i see an empty search bar where i can type queries for search 00:35:25.420 |
information i should type how to fix wi-fi problems so you can see it's understanding what to do 00:35:29.740 |
you know oh it hit a recapture page so okay the search results are not available blah blah so 00:35:38.140 |
so see it looks like it got stuck in a recapture page so this is like a headless agent so someone 00:35:42.220 |
asked a question about going to pass capture and whatnot you see that it's it got stuck doing that 00:35:46.220 |
it looks like it's stuck in a loop now so it sees the capture again so i should skip the clip button to 00:35:53.740 |
skip the capture window the capture is still open so it's probably going to be stuck here unless i close 00:35:58.620 |
it so you can see there are limitations it's not going to pass captures and whatnot but that's that was a good 00:36:03.580 |
query to show that it oh did it fill it it's still open so it's going to be stuck here so i'm just going 00:36:08.300 |
to close it out but you can see you know it can't pass everything it can't navigate through websites so 00:36:13.260 |
something like that will wasn't it will not work so that was a great test example to show 00:36:17.820 |
if i use the the baked in one you know find that copy made under 50 dollars it'll be able to go through 00:36:24.860 |
that and use headless mode but any questions on that seeing how the mcp server is working i didn't have to 00:36:30.380 |
write code i just said do something it actually wrote the code to do it for me 00:36:37.500 |
yeah yeah so a question about if i can actually go into the browser and do it myself yeah if i ran this 00:36:46.060 |
locally on my machine it will actually be able to you know open up the browser and i can have to click 00:36:50.540 |
the button and it'll continue doing that right now i'm running it within the browser so i'm everything 00:36:56.220 |
everything in headless mode so we can't interact with that 00:36:58.540 |
so you can see it's able to find search under 50 dollars it can actually look at the website 00:37:06.940 |
it's found search results on amazon.com so yeah so that for that use case where we're not passing 00:37:12.860 |
captures is able to continue and find the information there 00:37:20.300 |
so a question about can i actually order something if i use my own browser session and like logged in 00:37:25.100 |
at my amazon.com account and said yes order this for me you know click through it will be able to 00:37:30.140 |
understand that thing but i would have to put it in i would have to use my own browser session so i 00:37:44.460 |
if you give novak the authentication for amazon for example like you give it your login details 00:37:53.100 |
then can it log in and complete that action for you yeah but if i say this is my username this is my 00:37:58.860 |
password enter that into that field and you'll be able to understand you know this is a sign-in button and i 00:38:03.820 |
have this information but again this is all python code so yeah you can encode it you can make it an 00:38:08.380 |
environment variable so it won't read it directly so a lot of ways to do that does it also like 00:38:12.540 |
understand 2fa let's say it asks you to go to your gmail and you will then open the gmail website check 00:38:19.180 |
the email if you're logged in again on your session and then input it or is it okay well if there's no 00:38:24.380 |
capture like we just thought of the capture yeah so there's no nothing blocking so but yeah again nova act 00:38:29.820 |
is free to use so there's a lot of creativity in this room so i think we should have like a 00:38:33.500 |
nova act hackathon i think that'll be you know do something crazy with nova act 00:38:37.260 |
all right so one more question yep one more can i book a flight when my price alert is less than 00:38:48.620 |
hundred dollars it's like a continuously check you probably use something else for that but yeah i mean 00:38:54.060 |
no back and open up that website it can just have a query every day you know open google flights and look 00:38:58.940 |
look at the quickest thing and if something is below this dead hold you know send me an email so again 00:39:03.580 |
this is all a python script so you can set up something that triggers like once a day like in a 00:39:08.300 |
lambda function and so yes totally possible so nova act is very flexible and because it can run in headless 00:39:14.380 |
mode you don't need to have that ui so that's really what makes it helpful for interacting with websites 00:39:27.740 |
thanks yeah this is pretty cool i'm a little bit confused like we have the nova sdk sdk api key and 00:39:37.500 |
we're also doing some stuff in bedrock ah yeah so how does this actually work yeah so in that the nova 00:39:45.020 |
api key separate but for this mcp client i did it actually needs a large thing with models to understand 00:39:51.580 |
what's still happening so if i go to claude oops i actually said i'm actually using a claude sonnet 00:39:59.820 |
3.5 for my mcp server so that's how because i just asked it you know find that website for me 00:40:05.820 |
how does it know that about any of the code doing that so it's using a large language model underneath 00:40:10.780 |
the hood to actually find that information so that's where we use bedrock for trying to find it in the 00:40:17.020 |
code but it's on it yeah i set the model id so your assistant you're an ai system helping you have tools 00:40:24.860 |
you're using cloud 3.5 sonnet you're making an api called a bedrock whenever something happens so 00:40:30.380 |
that's where the the llm we're using but nova act is separate from that so this m select of using you 00:40:35.820 |
know claude desktop it's running an llm in inside of that they would understand that for the mcp server 00:40:47.340 |
uh the question is uh does it integrate with browser plugins as well like could it integrate 00:41:00.380 |
with last pass if you have the last class plugin fill in the credentials through last pass and then 00:41:05.020 |
continue i haven't tried that but again it does you can set up to use your own browser so if you do that 00:41:10.220 |
and if that's integrated it might be able to do that and click through that but i have not tested that but 00:41:15.020 |
something to try out thank you the biggest problem you will face is two factor like even if you gave 00:41:22.380 |
it a password like if you're using something like google authenticator or something that would be like 00:41:27.180 |
the biggest problem to capture but other than that if you provide it an environmental variable or if you 00:41:32.380 |
give it instructions on how to access last pass in the browser you should be able to do it all right and 00:41:38.860 |
uh oh one more question then we'll go on to the last module 00:41:46.220 |
so clearly there are a lot of different uh agent architectures you could use um and what i can 00:41:58.300 |
imagine using this is uh like you have a coordinator agent set up somewhere that's running in the overall 00:42:04.860 |
app and then when something pops up and says hey you need to go and look this up online go and check it 00:42:10.220 |
uh it should mod so my question is how modular 00:42:14.220 |
i mean it's just python so it should be pretty modular right is that the way in which you're 00:42:19.740 |
imagining the architecture to be is just if i was coding a coordinator agent in lang chain or lang graph 00:42:26.300 |
for example it would then call your sub-agent and get and and run it stuff and then get and then get 00:42:33.100 |
a text-based output that i throw into my message queue that's how it all integrates together is that right 00:42:40.220 |
yeah that's one way you can do it so nova act again right it's a python so it could be a tool it could be 00:42:44.620 |
an api call and the next module we're actually going to show you how to actually make an agent from that so 00:42:49.260 |
good good t up right here uh so um juan talked about the strands uh at the beginning so strands is a new 00:42:57.100 |
agentic framework launched by uh aws so let me open up the link uh it's easy as a pip install strands 00:43:04.540 |
and the first agent is like agent equals that so it's very it's a model first uh way of interacting with 00:43:11.500 |
agents if you use a lot of agent frameworks in the past there's a lot of bootstrapping and making sure 00:43:16.460 |
everything is correct and like but that was necessary for kind of the older models like 00:43:20.380 |
if you think back to like like llama 2 for example like how how far our models have evolved since then 00:43:25.740 |
so but now we we can pass a lot of the you know bootstrapping we did previously the agent can figure 00:43:31.900 |
that out so we don't need all these very uh heavy ways and like you know make sure everything's typed 00:43:36.940 |
and every so whatnot so here's a very simple example of how i actually spun up uh and also it has mcp 00:43:44.060 |
native support so in this example i actually have two mcp servers uh i have the aws documentation and 00:43:51.500 |
aws diagrams mcp server so if you go to this like aws labs mcp these are the official um aws mcp servers 00:43:59.980 |
and there's a bunch of different ones from like a cost analysis nova canvas diagramming cloud formation 00:44:06.460 |
lots of different ones here uh so again it's all on github aws labs mcp but the example i do here is 00:44:13.900 |
i'm actually i made like a solutions architect agent your role is to help customers understand 00:44:18.780 |
this building on aws and i define these two mcp servers here i give it a prompt and i say this agent 00:44:27.020 |
has all the tools in the mcp server it has a bedrock model i'm using cloud haiku here and what's cool 00:44:33.740 |
about strands it can also use like light llm and olama so it has access to lots of different things or 00:44:39.340 |
you can run it locally and of course it has access to amazon bedrock so that's what we're using here 00:44:43.500 |
so all those three things makes the agent the tools the model and the system prompt and then i can say 00:44:50.540 |
get the documentation for aws lambda and create a diagram of a website that uses lambda so let me run this code 00:45:18.220 |
okay so it uses uv to install the mcp server locally a lot of people i don't know where does mcp run this 00:45:25.100 |
is running locally but there are other ways to run it like in a lambda function and whatnot but for just 00:45:29.660 |
testing it out it pulls down the the mcp server locally and runs it you can see it's already executing 00:45:35.260 |
so let's make this a bit bigger uh so it says okay i'm going to help you with that first i'm going to search 00:45:45.020 |
the aws lambda documentation i'll read the documentation then i'll create a diagram 00:45:49.260 |
illustrating a static site so you can see it does a post request to do the search so the mcp server 00:45:54.460 |
defines where everything is i don't have to like feed it in the well-architected framework the aws 00:45:59.900 |
documentation is always updated so it just knows call the search function it got the lambda welcome file 00:46:05.180 |
it put that in it's able to generate the diagram it generates the diagram it tells us what is going on how 00:46:12.540 |
the workflow looks like it tells me it saved the diagram to this location i can open it up generated 00:46:18.220 |
diagrams oops and now it's very small let me see if i can make this bigger 00:46:26.060 |
there we go so i was able to generate the diagram for me so all through that about uh you know 40 lines of 00:46:35.820 |
code i have two mcp servers i have my prompt and is able to understand that get that and just generate 00:46:41.340 |
something for me with that uh so that's very easy to get started with strands of building agentic 00:46:46.700 |
workflows i know agent means a lot of different things to different people but you know if you have 00:46:51.580 |
tools the model the system prompt do some type of action and strands makes it extremely easy to do that 00:46:57.820 |
if i use other frameworks it could be a lot more code to do something like that especially integrating mcp 00:47:02.940 |
natively like that i'm going to pause here for any strands questions 00:47:15.020 |
um i know bedrock already had its kind of agents sdk so is strands replacing that or is this now the pro is 00:47:25.580 |
this replacing that or is supposed to complement that like is this the preferred way of creating agents with 00:47:30.620 |
models in bedrock yeah well when it comes to preferred way it always comes down to your use 00:47:34.860 |
case so the bedrock agent has a lot more i guess opinionated ways to do things as you can do through 00:47:40.220 |
the console it has a built-in support uh right there in aws well strand is more it's an open source 00:47:45.900 |
framework so you can download the code you can use other models through that like light llm or llama 00:47:51.100 |
if you use bedrock agent you can't run that offline so there's different use cases different developer 00:47:56.220 |
tooling i mean me the software engineer i like you know code first doing things so it does 00:48:00.540 |
depend on your use case what you're trying to do in your experience okay can you show the code real 00:48:04.700 |
quick yeah yeah this is the code yeah just show the agent so this is an open source framework if you go 00:48:12.940 |
where it says agent you and it says model right now we're using a bedrock model but you can use another 00:48:18.140 |
model with light llm yeah so you don't need aws at all in that instance 00:48:23.420 |
you can use all llama you can use open ai you can use right 00:48:26.940 |
yeah so there's documentation and topic light llm a lot of different model providers 00:48:37.660 |
olama open ai so it's an open source framework so you can use it whatever you want so but yeah that's the 00:48:43.100 |
idea of a strand open source model agent development kit one question suppose i want to build a text to 00:48:50.860 |
sql agent and i have say 15 tools already built in that i want this agent to be able to use if i use this 00:49:00.060 |
framework how can i make sure that the agent know when to use the right tool and the sequence yeah great 00:49:10.060 |
question uh so i didn't this example i have a weather agent so one thing you said you already have tools 00:49:17.580 |
what i like about strands a lot is i can write a python function i already have and let's put this tool 00:49:23.100 |
decorator and that's it you know you don't have to put anything else it understands this is the uh what you 00:49:27.900 |
need to do and then when i'm going to that agent i have this tools and it has put in the the native tools 00:49:33.900 |
we're going to be using http request is a standard tool in the strands framework so in this example i'm 00:49:40.220 |
like asking what is the weather in seattle and then also how many words are in this response 00:49:44.780 |
this is open api api weather.gov where you don't need an api key and it can find the information for 00:49:51.340 |
you so i'm going to update this san francisco and this started wrong but it's to figure it out a 00:50:01.100 |
weather example where the word count and i was very specific you know find the weather first and then 00:50:06.300 |
how many words are in the response so it's able to use that tool it gets the forecast and then it knows 00:50:11.340 |
to use that word count tool next so we're passing a lot of the information to the model the models are very 00:50:16.940 |
smart now we don't have to say do this do this do this the let the agent figure it out that's kind 00:50:21.500 |
of the role of the agent you give it the context and tools necessary it figures out the best way to 00:50:26.300 |
solve the problem but then wouldn't it be prone to hallucination when you give it 20 tools and then 00:50:32.300 |
because we've tried that with aws no the similar things when you bind more i think more than 10 tools 00:50:41.020 |
it's going to sure it is always you know a balance but i'm again the models are much better like try 00:50:47.180 |
using cloud force on it is it hallucinating as much like these newer models are much better for 00:50:51.580 |
understanding the concept and understanding what tools when the older models sure they get confused 00:50:56.620 |
there's so many things but i'm very confident on these newer models they can understand your use 00:51:00.860 |
case and what tools available and figure out the best way to solve the problem so then with this framework 00:51:05.340 |
there wouldn't be a way for you to orchestrate a customized flow but more like you give the 00:51:12.620 |
control to the agent you could if you want to have like specific like do this specific way uh there are 00:51:19.020 |
different ways in strands uh with something called workflow mode where you actually say uh you know this 00:51:26.060 |
is the workflow i want to do research results analyze things write the final report if you have to do 00:51:31.260 |
something very sequential a strands has that i won't have time to go through all the different you know 00:51:36.540 |
ways to do multi-agent collaboration and whatnot but this for that specifically like i wanted to do x y 00:51:41.900 |
z first the workflow away can do that so yes then is it possible say um i i don't have a predefined 00:51:50.620 |
workflow but i know it needs to figure out the right workflow then then that's what i just did there 00:51:56.140 |
you know i just gave it a sentence and figured it out i see yes okay perfect thank you 00:51:59.580 |
so cloud four has something called interleave thinking i believe that's what it's called where 00:52:08.140 |
it can handle multiple tools processing much better than most models today so if you're passing in 20 00:52:14.860 |
tools it's able to work through the agentic loop to really figure out which tool to run and it's also 00:52:21.340 |
able to run parallel tool calls so rather than just say okay here's the objective let me run this tool it 00:52:28.860 |
can say here's the objective let me run this tool this tool this tool this tool and this tool and then 00:52:33.340 |
process the results and determine when needs to happen next so i would try a cloud for which he like 00:52:39.660 |
banjo mentioned then last example really quick uh again you know strands i made my nova act mcp server 00:52:48.220 |
and it can actually run that you know i had to find this is the mcp server use the nova act mcp you know 00:52:53.900 |
use the cloud so same type of thing i could have another agent you know use uh nova act as well 00:52:59.500 |
so strands make it very easy to build these agentic workflows uh so i really enjoyed the developer 00:53:06.220 |
experience of using strands and you know i already have the mcp server we see the same exact example 00:53:11.500 |
before so once you have the mcp server it's very easy to plug in into different uh architectures and 00:53:16.860 |
strands makes it very easy to accept that but yeah those were the three modules really about how to use 00:53:25.980 |
strands uh mcp then amazon nova act again strands is open source you can download it pip install strands 00:53:35.260 |
if you just type strandagents.com it'll take you to the documentation uh again also nova act 00:53:41.340 |
nova amazon.com it's free to log in and then i think that's all the time we have but we do have a 00:53:50.300 |
survey uh and you can get aws credit code by filling out this survey so banjo i have a question about that 00:53:57.100 |
workflow thing in strands when you create these individual agents can you define which tools are 00:54:03.660 |
passed on to each agents yeah yeah it's a great question dark about different agents uh running out 00:54:08.220 |
of time but i'll quickly show uh i have a multi-agent example i believe 00:54:13.340 |
oh i think you had it in the docs yeah yeah it's in the docs yeah yeah yeah each of these is a different 00:54:21.980 |
agent so you know this is an agent you can have a different system prompt you have different tools so 00:54:26.700 |
you're just defining the agent and then yeah you can have different tools different whatever there 00:54:30.540 |
different models and then the workflow would just call that so yes completely customizable so that's 00:54:35.420 |
the good thing about strand it's very easy to customize and build uh scalable solutions like that 00:54:39.580 |
thank you and then again uh here's the survey you can get aws credits for filling out this thing 00:54:45.820 |
tell us how we did what you liked what you want to learn more and now go build 00:55:03.260 |
yeah any other questions i think we have a minute 00:55:12.860 |
thanks for the presentation um so as these systems develop i think that it's reasonable to assume that 00:55:21.900 |
they would emerge as an increasingly effective vehicle for committing fraud online at scale which would push 00:55:29.740 |
businesses to implement more things like captcha which kind of decreases the surface area that 00:55:35.580 |
tools like this would be applicable so what is the long-term strategy for that well you already saw we 00:55:41.660 |
failed with captcha today like you know we're not trying to back capture we're not trying to break things 00:55:45.820 |
you know a responsible ai is very important to amazon so no we're not trying to let this tool commit fraud 00:55:51.100 |
you know you have to have an api key so it could be monitored so use cases like that will be shut down 00:56:09.260 |
oh we can keep going we have more time oh the clock the clock ran out so i thought we were kicked out 00:56:19.980 |
all right well more questions then i guess i thought yeah another question 00:56:26.940 |
so regarding nova act let's say that i have a headless browser in the cloud is there a way to connect 00:56:42.540 |
nova act to my custom browser instance in the cloud yeah yeah yeah you can there's a way to like put your own 00:56:48.140 |
browser instance so yeah novak supports that so oh pretty possible yeah thanks 00:57:17.180 |
i think right because it says start at one and then you have 120 minutes 00:57:20.700 |
i know i just told him okay but i think what happened that time 00:57:23.340 |
oh yeah no you guys can keep rocking it okay yeah 00:57:30.060 |
so yeah there's a way to set up your own user agent for nova act so definitely possible 00:57:44.220 |
there's a lot of time so i don't know if anyone actually got into the workshop so we can still 00:57:58.780 |
build some stuff or i can try some other examples 00:58:11.260 |
that's a lot of time so we can try to make a streamlit app with nova act so we can try that one 00:58:16.300 |
so we can try to make a streamlit app so we can try to make a streamlit app with nova act so we can try if that works 00:58:21.340 |
so we can try to make a streamlit app with nova act so we can try to make a streamlit app with nova act so we can try if that works 00:59:02.380 |
so one example i tried i tried to make a streamlit app that uh look for like the top five uh playstation games on game faqs and then create an image like a nice graph for me 00:59:16.460 |
but it can't fail so uh i think that's one of the issues there i think it failed at one of the steps 00:59:22.140 |
there uh let's see oh that nova act got an error so it couldn't navigate gamefaqs.com so it does it does 00:59:33.660 |
fail at some of the things so that's you know again research preview you have to be more specific on how it goes 00:59:39.020 |
through things uh oh yeah let me show you where the code is just so you can have an example let me pull up 00:59:45.980 |
yeah let me try let me set up my local machine so we can see how it works 00:59:55.500 |
oh yeah go for it does nova act depend on like uh semantic html and like good web design to actually 01:00:12.700 |
work i mean it understands the actual page so it can click through those things but if the if the page 01:00:17.900 |
like doesn't have like a search box or button and not be able to navigate so as long as the page it can see 01:00:21.900 |
the page it can see the page understand where to click and then click those correct buttons so 01:00:25.740 |
i get maybe a follow-up is there any like efforts to do like experimental like engagement on the page 01:00:33.100 |
so if it comes on a page that it's not familiar with maybe it would try and act like a human would 01:00:38.380 |
to like click on things or try things out depending what you you put in that prompt because again you're 01:00:44.060 |
creating that workflow what it should do so if you say you know explore this website and find thing 01:00:49.020 |
they will try to click through that but again it's up to kind of what that initial prompt is that you 01:00:53.820 |
have for it yeah when you're using over act you're kind of giving it step-by-step instructions when you're 01:00:59.260 |
using the sdk so that way if you kind of know it's an obscure website you can give it those instructions 01:01:06.540 |
that it needs to perform rather than the mcp server is using natural language to infer what needs to be 01:01:13.660 |
done so there's not specific instructions coming from you unless you provide it 01:01:27.100 |
yeah so i'm going to run it locally on my machine just to show an example let's see 01:01:32.140 |
oh let me hide my key for a second because it's been recorded 01:01:59.900 |
all right so i'm just running it locally on my machine so without headless mode so you can see 01:02:17.180 |
so what we're looking at now is not in headless mode this is actually nova act actually 01:02:24.620 |
performing the task in the browser so yeah a lot of questions about how does it work you know 01:02:30.220 |
and we can try more complicated examples i just wanted to show it can work on your machine 01:02:34.700 |
and you can see the log you know i'm just looking for and if i like change the page while just doing 01:02:43.420 |
something it's going to like mess up so i'm going to click the page and see what it does like so someone 01:02:47.580 |
asked about click things of that nature what's it going to do now 01:02:54.380 |
so see it crashed now because i brought i changed the different page didn't know what to do so 01:03:03.740 |
examples you can interact when it's when it's going through the motion as well and then 01:03:09.580 |
i believe i have an uh consider the mcp server i set up a clod instance 01:03:24.940 |
and then i have a my nova act mcp servers there so i'm able to actually you know i click this you can see 01:03:32.060 |
all the tools it has available so i can ask it to like navigate a website so 01:03:36.860 |
anyone having a complex example you can see the mcp server so i know some people have been asking some 01:03:45.260 |
complex examples so go ahead and give me one yeah you got you got one one one one question i had is uh 01:03:53.820 |
can nova support like drag and drop functionality you can try it do you have a specific website that 01:04:16.220 |
draw io uh let's go to draw.io and make a cool diagram use nova act 01:04:32.700 |
all right so let's go to draw io all right it opened the page 01:04:45.660 |
do i have to accept something nope it's going oops all right open dry let's see 01:04:56.700 |
wait for page to load look at my initial setup for template selections all right it's going 01:05:08.860 |
oh it crashed what happened oh do i have to allow allow always i took a screenshot 01:05:18.460 |
i need to continue the browser session to see what's available let's look at the screenshot all right it's 01:05:31.180 |
it's opening up again uh it's going to draw io 01:05:36.300 |
yeah if i keep clicking away it clicks back to the die the browser session so i need like two monitors 01:05:57.180 |
let's see let's see let's see is it going to figure out how to use draw io 01:06:00.620 |
wait for page take screenshot look for template options come a blank paper all right it's 01:06:08.860 |
so it's kind of i didn't give it any specific instructions i just said make something cool so 01:06:14.060 |
maybe that's too hard to interpret for this website maybe i have to say click this click the square button and 01:06:19.820 |
then drag the square to the center or something i might have to be more explicit for that 01:06:24.220 |
so it seems it seems to have frozen all right it's clicking something all right click new 01:06:43.420 |
again it's not like super real time it's going it's not like instantaneously but it's it is clicking 01:06:48.540 |
through the buttons clicking through stuff all right did it do anything oh the claw 01:06:56.460 |
so it looks like it failed so yeah it looks like clod failed that one so i won't blame no for that but 01:07:03.420 |
that's like that's idea so thanks for trying to do something hard 01:07:08.540 |
okay another question back there oh yeah can we try another one yeah let's try another one sure can 01:07:14.700 |
we do uh you know on google maps find the top three rated coffee shops with within a mile radius of 01:07:22.380 |
this hotel top three coffee shops shops near the marriott marquis in san francisco 01:07:41.980 |
all right open maps google search mirror marquis san francisco wait for results to load so it has a 01:07:50.940 |
plan it's going to do something so let's see it opened google maps 01:08:01.340 |
all right let's have mirror marquis san francisco so it's able to type that 01:08:13.820 |
so there's a copy button let's see if it clicks that i'm curious 01:08:23.420 |
looks like it's frozen give it a couple more seconds 01:08:40.540 |
whether to click it got this 15 minutes i was trying to type in that box okay 01:09:00.380 |
all right it's typing coffee shops all right all right it's going 01:09:05.900 |
all right so all right it'll open the coffee shops and let's see if we can get those top three there's a 01:09:18.780 |
four eight four seven another four seven let's see if it can get that 01:09:44.300 |
i'm gonna blame claude claude desktop might need a different mcp client 01:09:52.380 |
i think yeah i think claude desktop doesn't like doing that but again because it's an mcp server i can open 01:10:01.260 |
up a different mcp client so i can open like cursor for example and ask it questions through that 01:10:20.620 |
you see it has the mcp tools oops it has this up let me just open up a new one 01:10:37.580 |
and then it's called the mcp tool again so that's the beauty of mcp i already have this server i can 01:10:46.780 |
just use a different client it can understand all the information it needs to and do the exact same command 01:10:51.500 |
so it's going to do the same thing cursor might be smarter than claude codes 01:10:56.140 |
but yeah it's able to do the exact same type of thing so 01:11:08.620 |
that model is that that is that running in the cloud 01:11:11.980 |
yes so nova act question was where is nova act running and yes it's running in the cloud 01:11:16.140 |
so yeah it's just you get that api key and it's doing the call behind the scenes in the aws cloud 01:11:21.100 |
so then what what does it upload to the cloud 01:11:24.060 |
well it's asking the questions and like you know go to google maps and then they would say i understand 01:11:29.260 |
that and it's actually clicking those buttons and doing the actions so the actual uh intent of 01:11:34.940 |
what you're trying to do in the specific action 01:11:39.100 |
and then if i was if i was using it locally you couldn't use nova act locally it has to be uh connected 01:11:47.500 |
to the internet to use it okay but if i for example though if i i wanted it to like look at my gmail 01:12:01.100 |
ah yeah i see what you're saying yeah yeah it's you it's i mean it is you know it's an api endpoint 01:12:06.220 |
it's been past the aws so you know only passing information that you feel like it's not going to be 01:12:10.620 |
without training the data or taking any of that nature but it's going to the aws cloud and processing 01:12:15.420 |
you know what to click on this button locally on your like browser 01:12:24.460 |
so looks like it's not yeah see now it's even certain the rating it actually knows which rating 01:12:31.740 |
right yeah yeah well nova act is executing like in this mcp server example i say you know 01:12:43.020 |
find the top three coffee shops in marriott near the marriott marquee and then i'm passing that 01:12:47.980 |
information that the the llm is understanding that plan and then it uses nova act to interact with the 01:12:53.740 |
browser because like cursor or cloud code or amazon queue they can't interact with 01:12:58.700 |
the specific uh you know website by itself it uses it uses nova act to do that right but like given a 01:13:05.020 |
question though like how how does it come with uh come up with the plan oh the mcp server like that the 01:13:11.340 |
client so i picked the model in the example we had the mcp client we had as we showed the model 01:13:17.020 |
right like the cloud 3.5 yeah that's coming up with the plan same thing here you know i asked you know 01:13:22.460 |
help me find the top three coffee shop near the marriott marquee this the model that uh cursor is using 01:13:28.620 |
is coming up with that plan and then i'm using the nova act mcp server to act on it exactly so this is 01:13:34.460 |
the plan search for marriott marquee click the marriott marquee you know search for the things 01:13:38.300 |
and you see all this information nova act return and actually it will return this time so i think 01:13:43.980 |
the problem was with claude desktop but it got the three top three copy stops there right what are all 01:13:49.420 |
the tools that uh nova act can do today uh so the mcp server is what i wrote so uh but the idea between 01:13:56.860 |
nova act again it can interface with the web browser that that's the tool the browser is the tool and it can 01:14:01.740 |
anything on the website you can actually click through go through the example etc i see 01:14:05.980 |
you got the repo you got an architecture that showed the mcp just so they can see it yeah so i mentioned 01:14:15.820 |
uh there's an official aws mcp servers so uh this aws labs mcp and a lot of different um mcp servers here 01:14:25.340 |
for the one the nova act one i created my own one uh go back to the nova act examples or where do the 01:14:32.700 |
ah here when i use amazon q to explain you know the mcp server for like what what's going on what tool 01:14:41.180 |
was the browser session performing an action on the browser so this is a good uh thing to talk about so 01:14:47.580 |
can you dive deeper on the browser action function and we can see because this is how it's actually 01:14:57.180 |
acting so uh amazon cube browser action is designed to perform actions it has this uh what's cool about 01:15:07.340 |
it it just does oops it's going to the code it performs a single action in the nova act browser 01:15:15.260 |
so it's executing that action it stores this act.act is like what nova says you know click the search 01:15:21.740 |
bar do this x y you know my the mcp client understands how to use this act.act it passes 01:15:29.260 |
the correct action so we saw the example here one of the actions was like go to google maps or click 01:15:35.820 |
this button or do that search that's how it's able you know these actions and then the nova act mcp server 01:15:41.980 |
is translating that to actually click that button so the mcp server provides all the interfaces it 01:15:47.820 |
necessarily needs so then these mcp clients can interact and do actions and do things yeah and nova act 01:15:54.620 |
is just the model in the background that's able to click those buttons extending this question it so your mcp 01:16:03.340 |
server so claude uh or cursor running locally right it's calling your mcp server that's also running 01:16:10.380 |
locally is your mcp and your mcp server is the one that spun up the i guess the chromium instance yeah 01:16:16.220 |
is it is your mcp server taking screenshots of what you see in chromium and shipping them to nova to nova act 01:16:23.900 |
the screenshots are locally and then based on that like you can see it's actually getting all the 01:16:27.900 |
information uh the final page information so it's not storing your screenshot data and sending that 01:16:33.100 |
everything that it's running locally it's clicking those buttons based on what's on the browser sensor 01:16:38.140 |
got it but is is any of any of the information in chromium does that any of that need to be sent into 01:16:42.860 |
any no no everything running yes running locally okay distinction okay perfect thank you 01:17:09.900 |
where was that looking so one of the things about making mcp servers is you have to provide a lot 01:17:18.540 |
of context so uh for nova act like i say you know when writing active for nova action be descriptive of 01:17:24.860 |
what to do you know click the hamburger menu icon go to order history don't find my order so the more you 01:17:30.620 |
know uh concise and just prescriptive of what you want to do it's better you know search for hotels in 01:17:36.140 |
houston so by average customer like so the better specific it is uh that's how the mcp's uh client 01:17:42.700 |
is able to make those great requests and find the information so type copy maker search rock enter so 01:17:48.700 |
the more prescriptive you are of nova act the better results you're going to be and i encoded that all into 01:17:53.260 |
this uh mcp server so the clients can leverage that so i think that's probably one of the hardest things 01:18:00.540 |
about making the mcp servers that's making sure you provide the next context of when to use the tool how to use the 01:18:05.980 |
tool the inputs and outputs but once you solve all that it's very easy to plug and play the different mcp 01:18:52.140 |
right so when nova act is doing something it's passing back the log of everything it's doing so you 01:18:57.500 |
know what what steps it did so the starting page the add the results the action result id so it's keeping a 01:19:03.420 |
of log of everything it did uh power so it's able to get that json i understand what the id what the 01:19:08.860 |
result is so you can see what it's doing so it can move on to the next step yep 01:19:27.900 |
sorry a quick question is this able to do uh like uh automated ui testing because of this 01:19:35.100 |
well with nova act you know you can define like what you want it to do so you're going to have to define 01:19:40.940 |
you know go to this button click this does this work so you can define that workflow so i mentioned 01:19:46.460 |
before like back in the day like if i'm writing selenium code i have to go click this h1 tag do 01:19:51.260 |
this like now you can just write a natural language you know click this button click that button so yes 01:19:55.580 |
it can handle that use case uh specifically of like opening the browser or checking these things and 01:20:00.300 |
but you have to like you know this nova act search for coffee maker you know you have you specifically have 01:20:05.580 |
to write what buttons to press yeah thank you 01:20:12.860 |
let's see i guess we have time i can show some multi-agent collaboration with strands that could be 01:20:25.020 |
something cool uh i think i have a repo for that so should be uh go to the aws labs page where's that 01:20:49.340 |
okay i'm just gonna copy this code and put it into our environment 01:21:14.700 |
so in this example i'm actually going to show how strands as multi-agent collaboration so one way 01:21:33.580 |
i'm actually going to create a powerpoint presentation based on uh you know a cloud migration request i want 01:21:40.220 |
to like move my uh infrastructure on premise to the cloud give me a presentation of how i would do that 01:21:46.620 |
and so for this i created three different agents i created a cost analysis agent so i have a system 01:21:52.540 |
prompt there a solutions architect agent does a map out what you're going to be doing and then each of 01:21:58.780 |
these uh tools is an actual agent so this uh cost analysis has the docs mcp server the cost analysis mcp 01:22:06.940 |
server it has its own prompt the presentation agent has its own system prompt it has a tool from the 01:22:13.100 |
a powerpoint mcp server that i'm using and there's an architecture agent it also has you know its own 01:22:19.180 |
specific tools system prompt etc so different agents for different uh things in the workflow and then i have 01:22:27.900 |
this orchestrator agent welcome the migration orchestration agent it has a prompt and i tell 01:22:32.540 |
what tools it has access to and then the cool thing with strands is i make this orchestrator agent and 01:22:38.620 |
then the tools are just other agents in that so it knows when to call this agent for this particular tool 01:22:44.460 |
when to do that and i say you know i want to migrate my work my uh workload so write the fight tools to find 01:22:52.940 |
that so i made a fictional company called shop easy e-commerce they have on-premise java my sql database 01:23:00.140 |
yeah i want zero down from migration like all this all these little constraints in there and i wanted to 01:23:06.060 |
make a migration plan and a powerpoint presentation that i can present to my executives of how this would 01:23:11.580 |
work and i just designed and i'm the orchestrator agent will find out what to do i don't specifically say 01:23:18.060 |
do this one first do that first we'll let the the agent figure that out so let me run that strands 01:23:33.020 |
all right so cloud progression agent as tools all right again so all the mcp servers running locally 01:23:42.780 |
it downloads it's using the ux it starts with the architecture design first generates a diagram 01:23:54.940 |
i'm going to use waft so take some time it might fail but it will just update update itself 01:24:03.420 |
all right i think it couldn't generate the diagram there but it's saying all right i'm just gonna this 01:24:17.580 |
this is what the diagram should have this is what we're going to doing 01:24:24.140 |
now i'm just going to do a cost analysis cost analysis on basically the things we did there so 01:24:31.500 |
it's it's a this this workflow takes me a couple minutes to run but you can see it's calling all these 01:24:36.540 |
agents uh different things it's understanding what to do what actions to take first it's finding pricing for 01:24:42.780 |
eks because it has a cost analysis tool and knows where to find that information so it has the up-to-date 01:24:49.340 |
pricing all the time funding for aurora for its database so it's able to understand all that information 01:24:55.100 |
and get the real-time up-to-date information just because we have that uh pricing mcp server from the aws labs 01:25:02.460 |
it's called cost analysis yeah cost analysis mcp server documentation all the stuff you need for 01:25:15.740 |
finding the right price on aws it has all that information and the agent was able to just use that 01:25:21.420 |
one that's going to generate report so it's still running again this does take a while because i'm 01:25:32.460 |
asking a very complex question a lot of things going so it does take a couple minutes to run through all 01:25:37.660 |
that it gets its monthly spend predictions monthly savings etc so they would understand all the information 01:25:45.260 |
and get all up-to-date information based on the plan we provided and the last thing now wants to create 01:25:53.020 |
an executive presentation so download the powerpoint mcp server and now it's going to make a powerpoint 01:25:57.980 |
presentation based on that so adding the title slide so you know add a placeholder so generating powerpoint is a 01:26:08.700 |
very popular use case and there's an mcp server that can go ahead and just do that add bullet points 01:26:13.980 |
etc so give it a couple another minute or two