back to index

Building Agents with Amazon Nova Act and MCP - Du'An Lightfoot, Amazon (Full Workshop)


Whisper Transcript | Transcript Only Page

00:00:00.000 | building agents with Amazon Nova Act and MCP I'm excited today because we're going to build
00:00:25.680 | intelligent autonomous AI systems that can help you build scale and improve your applications
00:00:35.440 | and business my name is the one Lightfoot and I'm joined by my name is the one Lightfoot and I'm
00:00:54.960 | joined by hey I'm Banjo Biyami I'm a solutions architect here at AWS now this is the AI engineer
00:01:03.480 | welfare and I've been in tech over 15 years and right now is the most exciting time for me in my
00:01:13.560 | entire career and one of the reasons for this excitement is agents how many of you right now
00:01:20.640 | are building agentic systems I love it so when we talk about agentic AI I think it's important that we
00:01:31.800 | level set from an AWS perspective there are three key terms we need to think about first the ability to
00:01:41.160 | plan agent gets a prompt it gets an objective and it determines the actions that need to be taken so
00:01:49.680 | creates the plan and then it takes actions on those actions by using things like tools now the last piece
00:01:58.080 | the third piece the third piece and probably the most interesting is the reasoning where the agent is able to evaluate
00:02:06.600 | the results and determine if it needs to update the plan and take additional actions until the objective is complete this is an agent
00:02:16.860 | now when we actually break down the architecture I think it's important to take a look at this because we have the user input we have the agentic system we have the possibility of some type of human in the loop and then we have the generator response
00:02:34.860 | response now when we dive a little deeper there are some components of this agentic system we have the LLM we have a knowledge base with external information that we may want to provide we have guardrails to say to the model don't do this or to ground the model with the truth from our knowledge base to say okay is this actually relevant information is this accurate to the information we're receiving from the knowledge base
00:03:02.860 | and then we have access to additional tools memory or we may need to talk to additional agents or LLMs like Amazon over at do something like MCP and we have the ability to design our own flows for these systems
00:03:18.860 | now the most interesting piece that I think a lot of us are probably focused on when we're building these systems is around the continuous evaluation framework like how do we know if we're using the
00:03:30.860 | right LLM how do we know if our prompt is consistent accurate or even optimized for the performance we're expecting and then how do we even judge our system how do we rate that and determine that that it's actually solving the problems that we need or intend
00:03:46.860 | now once we have this we need to log this information and then have some type of subject matter expert and determine how can we improve this system and this is the iterative approach so we're always trying to improve
00:03:58.860 | and optimize our agent system now continuing on this continuing on with this story now there are some use cases that we should be building these systems for like if it's complex tasks and we don't know which tool should be used how many tools should be used and we want the model to leverage his reasoning capabilities well this is a great use case for agent system but if it's something that is just one step
00:04:26.860 | our traditional if this then that approach is probably the best solution right we don't always need to provide some type of agency system for something that can be done with a traditional solution
00:04:39.860 | now when we talk about agents on AWS there are three approaches and perspectives we should think about
00:04:46.860 | first it's going to be the specialized using something like Amazon Q how many of you are have used Amazon Q the there's Amazon Q in the console to help solve your problems on AWS in the console
00:04:59.860 | there's Amazon Q developer inside of your IDE and right now one that I'm I would think most excited about is Amazon Q CLI agent how many of you have used that
00:05:08.860 | for me if you if you are into increasing your productivity using a CI CLI agent has helped me tremendously from editing the video it can do that summarizing the document reading my entire code base like today for one of my demos I had some code and I was trying to figure out why wasn't it working I said analyze this code and tell me what you see let me know what you see
00:05:36.860 | let me know the API's that it's calling well I looked at the API's well it didn't match my API's in the API gateway so when the code was deployed it wasn't deployed with the right API's so the agent was able to help me save a ton of time by just analyzing the code and tell me what it saw because I never seen the code before right so that's what these tools are able to help us do the next is fully managed if you're using Amazon bedrock you're able to leverage Amazon bedrock agents to build
00:06:04.860 | build and manage agents inside AWS and today what we're going to be focused on is the DIY to do-it-yourself approach by using strands agents this is allows you to not just leverage Amazon bedrock but also leverage models through other providers using light LLM
00:06:24.860 | now when we talk about strands agents strands agent was announced about a month ago I want to say something about a month ago this is open source extremely lightweight
00:06:34.860 | so if you use other agents flat frameworks is like that but the implementation is you'll see in the code how easy it is to build an agentic system or agent itself in a few lines of code and already get started I built a multi-agent solution in about under 50 lines of code
00:06:52.860 | and so when we break down strands agents there are three components we have a prompt we have a LLM and we have tools so you create a function called let's say a get weather tool right you define your agent you give it a prompt and it's already implemented and you'll see in the code as
00:07:14.860 | as banjo goes through here in a moment now taking it a step further as danielle presented today on amazon nova act
00:07:24.860 | these models are able to do some really cool things and this is another thing that i'm excited about amazon nova act is a research preview model and the capabilities of this allows you to use a prompt or give instructions and take complex tasks and do the
00:07:42.860 | and take complex tasks and do things like browse the internet to find research or to research or to search on amazon.com to find the top list of widgets
00:07:52.860 | right and then return them and then add them to your card so you'll see how we can leverage this not just using the sdk for amazon nova act but also by leveraging mcp which leads us into the last piece which i think when we're talking about agents i
00:08:10.860 | i don't think we will be here today as fast as we have moved if it wasn't for mcp how many of you are
00:08:17.900 | leveraging mcp today model context protocol how many of you have built your own mcp servers
00:08:23.580 | i built several um i got two that i use all the time one how many of you use obsidian
00:08:33.580 | okay so for my documentation i built obsidian mcp server this allows me to save all my documents
00:08:41.500 | reference all my documents and just my entire workflow is streamlined because of this mcp server i use right
00:08:48.780 | there but i also use one for my bookmarks i built a bookmark manager because every friday i'm restarting my
00:08:54.620 | i'm restarting my computer and i lose my bookmarks i save them and i forget about them but now i can just
00:08:59.500 | say save this bookmark it gives it a description gives it a title give it a date and i can even add
00:09:04.380 | notes so i can remember where this bookmark so now when i open up qcli i can say hey i'm looking on the
00:09:10.140 | topic i'm looking for um some information on mcp can you tell me all the bookmarks that i have then it'll
00:09:15.260 | find it can you tell me the ones i saved last week and so these this is the power that we have today but
00:09:20.780 | but with that being said i think it's time that we all start building banjo is going to take over but
00:09:27.500 | if you you open your laptops and log on to this link this is going to take you to a workshop environment
00:09:34.060 | where you have access to an amazon account where banjo is going to walk you through building out
00:09:39.740 | today's workshop i thank you for your time cool all right so uh this is going to be a hands-on workshop so
00:09:46.540 | we've provisioned an aws account for everybody here so you don't have to install anything on your
00:09:51.180 | computer everything's going to be done through the browser and i always say the hardest part of the
00:09:56.220 | workshop is just getting started so some of my colleagues are also here so raise your hand aws
00:10:00.220 | folks that are here to support so we're going to take some time to just get logged into an environment
00:10:05.340 | we're going to set up a vs code server enable models get the nova act api key so again this is the
00:10:11.340 | hardest part of the workshop that's getting started so let's take some time to just get into the environment
00:10:16.620 | and i'll follow along as well so and this is uh again everything is you don't have to install
00:10:23.020 | anything on your computer you don't have to use your own aws account everything is provisioned for
00:10:26.380 | you so but while that's loading i'm going to briefly walk through the three modules of the workshop
00:10:30.540 | so the workshop is really about how you can use nova act so the first module is just getting started
00:10:36.140 | with nova act we're going to make an api call that the second part of the module is going to make an mcp
00:10:41.420 | server that can leverage an nova act and finally we're going to use the strands agent to cook
00:10:45.900 | everything together so that's kind of the the three steps we'll go through at this workshop and all the
00:10:50.380 | code is available via the link on github so you can try it on your own as well uh but yeah trying to get
00:10:56.940 | started here if you can't follow along i'm going to be doing up here so don't worry too much and again
00:11:01.740 | all the code is available so you can try it out line offline okay so the first things first uh if you're
00:11:09.820 | following along make sure to click this open aws console button again we provision the aws account
00:11:15.180 | you know don't log into your own aws account don't try to create a new one everything is uh pre-visioned
00:11:20.460 | here already so i'm gonna click clicking that button to open up your aws account so logged into my aws account
00:11:34.940 | so the first thing we do in the aws account we're gonna enable uh amazon bedrock models so
00:11:39.740 | amazon bedrock think of as a serverless api to access different foundation models and you can build lots
00:11:45.340 | of agenda for applications in it so it has capabilities like knowledge bases guardrails you can build agents
00:11:51.580 | on top of it for anything you need to build uh ai agents or agenda applications amazon bedrock has
00:11:57.740 | capabilities for that but for this workshop we're just gonna enable specific models so gonna enable
00:12:04.860 | specific models you can click the amazon models and then we'll use the cloud 37 3.5 iq and 3.5 sonnet
00:12:13.020 | so those are the ones we're going to use for this workshop
00:12:17.820 | and that's going to request access there
00:12:21.020 | and again all the instructions are also in this workshop as well so we could follow along but i'm
00:12:31.820 | just going to go through it just for sake of time and then the next part once we get the model access
00:12:37.500 | there's a vs code server that has everything set up already so i'm just going to go in there
00:12:47.100 | and if the url and password is there you can log into your vs code server with everything installed
00:13:04.220 | i'm also going to log into amazon queue so amazon queues are id extension to help you write code
00:13:14.860 | we have time you can sign up through a builder id completely free you don't need uh aw's account
00:13:19.740 | you don't need to put in your credit card you can just log in through there i already have an account
00:13:24.220 | so just feed it up but it puts a nice little ai agent there you can ask questions update code etc so
00:13:31.980 | it's uh i'll show you some examples i'll just go through some of the code so
00:13:36.300 | so who's gotten to this point setting up all the models workshops because this is once you get all this
00:13:44.300 | done then that's when the real fun begins so that's getting a pulse if i need to slow down or
00:13:49.340 | slow down a bit okay i'll wait a bit again raise your hand if you're stuck anywhere questions
00:13:54.860 | we have uh agents that can come around and support you so i'm going to pause for a little bit
00:13:59.740 | any general questions while we're waiting
00:14:09.660 | oh yeah so this workshop uh again all the code is available online uh this workshop available as
00:14:15.180 | well so you can also look through that there's a website called workshops.aws
00:14:22.860 | and when you go there you can do something like uh nova act and then it's the only work that that shows
00:14:29.420 | up so you can always go to workshop.aws that search nova act and this workshop was so up so you can see
00:14:35.180 | see all the instructions all the code and run this uh on your own
00:14:52.300 | okay and then last thing uh because we're going to use nova act we actually need to get a nova act api
00:15:00.460 | key so if you go to nova amazon.com it this is a website that you can use the amazon nova model so
00:15:07.900 | you can do like chatting generating images uh speaking with nova uh generate videos but then also
00:15:14.140 | this is where the act api key is generated so if you're following along and you want to generate your
00:15:20.700 | key again it's free to log in you can use your amazon.com uh like when you order something on
00:15:26.620 | amazon.com account to log into this and then you can just generate a key here and they'll be able to
00:15:31.740 | access that
00:15:49.420 | oops oops okay so i'm going to walk through what module one is uh before again has anybody got
00:15:56.380 | in here just quick pulse check if not you know i'll continue i know the wi-fi is slow so it might be
00:16:01.580 | hard so i'll just continue on uh but yeah the first one we're going to see how nova act works uh how
00:16:08.700 | how the actual code looks like uh generated the key i need to export the key and then kind of running
00:16:14.780 | the first script which is actually going to open amazon.com
00:16:18.140 | uh and we're actually going to look for the first coffee maker so let me see how that code looks like
00:16:24.460 | let's go here make this bigger
00:16:27.900 | so very simple code uh with nova act it's again it's all in python sdk i so i decide what a page to go to
00:16:43.980 | so go to amazon.com i say i want you to search for a coffee maker i say select the first result and i say
00:16:50.300 | get the title of that product page so uh very simple if you've ever done kind of web automation before
00:16:55.740 | of something like uh selenium or playwright you probably have to like look for this diff tag you
00:17:00.380 | know look at this h1 tag grab this information a lot of manual processes of actually inspecting
00:17:06.060 | the actual website here i'm just saying click the search bar find something like i don't have to
00:17:10.940 | specify click this tag do that so it makes it much more easier to engage with the website as a natural
00:17:16.300 | human would instead of like looking through divs and trying to find this p tag specifically so
00:17:21.900 | uh this is a great way to just uh you know use nova act right out of the box so i'm going to uh run this
00:17:29.260 | so you can see
00:17:31.500 | examples all right all right so added my key going to what happens when i run this file
00:17:47.500 | give it a second
00:18:00.300 | oops we failed all right let's start over
00:18:06.940 | put it down three let's write one
00:18:16.060 | ah okay i know
00:18:21.100 | i gotta run it with this let's start that over
00:18:25.980 | yeah question
00:18:36.700 | so just explain why is banjo running that command it's running fxv fb it's a frame buffer where it runs your
00:18:45.100 | x11 system what happens there
00:18:47.660 | nova act actually goes and clicks a mouse on a browser that's why it needs to be run like that
00:18:52.780 | otherwise it has no gui so this is just kind of a way to emulate a graphical user interface on this
00:18:58.540 | linux box thank you daco yeah since we're running everything in the the cloud on a browser i'm saying
00:19:04.060 | you know open a browser again but it's already in a browser so that's why i crashed so i have to put
00:19:07.980 | that a frame buffer command uh and yeah the workshop kind of walks through why we did that but you can
00:19:13.820 | see uh what is going on when nova act says i'm going to search for a copy maker i'm at the amazon home
00:19:19.580 | page my task is to search for this so it's understanding what it's doing i see the search
00:19:23.820 | bark has copy maker i'm at the search spark here and now it actually puts the actual log of the actual
00:19:30.460 | html file so it's taking screenshots you can see what it looks like it got the first results i'm on
00:19:35.900 | the copy maker page it selected it and now i've got the title now it says you know what's the title of
00:19:41.580 | this product page all right got this black decker 12 copy maker my task to return the title of the
00:19:51.900 | product page product title it got that and it ended the session and then it also creates a a video log that
00:19:58.380 | i can actually look at to see what it did for each for everything it did in this webm file so yeah question
00:20:06.540 | yeah we'll repeat that it would be a microphone
00:20:19.180 | so the question was does it reason about the page in terms of pixels or in terms of text
00:20:30.140 | yeah so it's actually looking through the actual uh the page itself so you see in this video it sees
00:20:36.460 | it looks at the page i can see what's in the page so it's it's a large language model train so it can
00:20:41.660 | actually see the actual the page is doing so it's not looking at like like the h1 tag or whatnot and
00:20:47.100 | understand the context of that particular page you can see that's a search box okay i'm going to go
00:20:51.900 | click through that search box so yes it understands the pixel level of what's on that actual page
00:20:56.620 | so this is kind of the video it's hard to see make it bigger sped up
00:21:06.380 | so it opens the page it goes to the able to type in coffee maker there
00:21:10.700 | um it gets that information clicks the button so even if all the ads and everything the video can
00:21:16.860 | understand the task clicks that and it gets the information back so that's and that was a couple
00:21:21.900 | lines of code so you can extrapolate to other type of workflows you can do for searching through things
00:21:27.420 | sorry i have another question yeah so when you what i've experienced with these kind of frameworks is
00:21:33.580 | that when you run this on a server environment um services like cloudflare will block the access and
00:21:39.740 | maybe do a captcha challenge how do we solve that using q yeah so with uh so using amazon nova ad so it
00:21:46.620 | doesn't do captures it doesn't do that nature so it's it's meant for like workflows you understand but
00:21:51.020 | yes it's not going to bypass captures and other things of that nature as well so it's made for
00:21:55.420 | like going to amazon.com or look through a booking site but if something that like requires like a human or
00:22:00.540 | it wants again that you can't bypass that
00:22:02.380 | you you wouldn't use nova act for that use case if you need to pass a capture or something else that
00:22:10.860 | use another technology this is not meant to like overtake humans you know it's more like i'm helping
00:22:15.420 | them augment things but not if there's a capture involved and have to use a different technology for
00:22:19.580 | that it's awesome a preview it's awesome a preview yes this is a research preview as well so if that's
00:22:25.900 | a very good use case you know leave feedback on the nova the website so yeah is is human and loop
00:22:31.980 | possible at all with it yet well this this one it's no because i'm writing all the code here so but
00:22:39.500 | again this is python code so i could probably put in something here like you know ask something make
00:22:44.220 | an api call here so this is you know it's a python code you might be able to create some type of uh
00:22:49.660 | workflow that might augment like wait for a human response or whatnot because the browser is happening
00:22:55.180 | in like headless mode but could you make it work with a browser to human is also seeing at the same time
00:22:59.820 | yes yeah so it can pause and wait for somebody put in like a password credentials or do a capture
00:23:04.940 | and then once it receives that works continue on the workflow you could do yeah because right now i
00:23:09.740 | i ran it in headless mode but yes it can also run uh you know to open up the browser if i ran this on
00:23:14.380 | my macbook i would open up a chrome browser and go through that session also if you're running it and
00:23:20.220 | you wanted to bypass something that's two-factor if you're already logged into say amazon.com and then you
00:23:27.180 | run a code it's going to use your credentials in that browser session to continue on to perform that task
00:23:33.180 | so that's something that you can do as well cool so let me oops and then one other thing you can also
00:23:44.140 | do multi uh you know parallel execution so my last my next example is actually i'm trying to find
00:23:49.900 | multiple monitors and i want to compare them all at once so i'll show you how that code looks like
00:24:02.540 | so i can check for the monitor extract information i'm setting you know i want i'm defining what i
00:24:09.900 | want so again i'm you know saying i want to find the price the rating the size uh go to amazon.com
00:24:17.260 | uh i set it headless mode this time so i don't need to do the frame buffer i start multiple threads
00:24:22.460 | it looks for each monitor simultaneously because each of these are individual tasks so i can paralyze
00:24:27.340 | them instead of waiting it to go through i define the list of monitors i want to go through start the
00:24:33.260 | thread and then it starts executing and finds the results of the monitors so i can run that in the
00:24:38.700 | background it was starting with three parallel threads and it's open so again running in headless mode so
00:24:49.500 | it's going to be able to do this in the background where we can see kind of what the model is thinking
00:24:53.340 | how it navigates through the web page yep all right i have tried to use nova in the past april and uh it worked for
00:25:03.900 | the first time but once i did it again it triggered the capture is this something that has been already
00:25:10.780 | resolved or is this happening because i think the website and it was amazon in this case it was detecting
00:25:16.780 | it was a bot and uh is there like an llms.txt or robots.txt that can declare it so nova actor is a github repo
00:25:25.340 | so you could go there and just grab that but it's it's working now like i'm running it you know i just i this is
00:25:30.940 | this live code i'm doing right now like i just exported my api key started running it so
00:25:36.380 | uh you can try it in the workshop i'll be yeah i mean it's ready to go we're building right now
00:25:41.660 | and you can kind of see that it's going on in the background what it's doing
00:25:47.100 | i've looked at this monitor the dell monitor i'm at the amazon home page it's like it's going through
00:25:52.700 | looking through the search results it's saving things so you can see it's running in parallel it got the
00:25:57.100 | information for the one of the first ones so it's going as i just set that up and it can execute
00:26:02.620 | that so if you have some type of uh i don't know like daily news thing you need to go to the website
00:26:07.340 | and get news or something and i have a report nova and there's no api for that this is one way you can
00:26:12.300 | codify how to do that kind of search and get the information
00:26:17.500 | so if you have a question yeah i'm wondering so how successful is this in terms of like more
00:26:22.380 | ambiguous tasks because i i ran the amazon demo and that worked but i'm wondering could i just add google
00:26:27.020 | there sure and and and how like how big and and sort of how much does it know when it's navigating
00:26:33.580 | through like i was thinking like if i wanted to return a pair of sunglasses that broke would i would i be
00:26:39.820 | able to just say like start in google and then find this company's website find a way to you know
00:26:45.340 | engage support open a ticket like how much sort of like how vague can you be and how smart is it
00:26:52.300 | currently would that would that yeah i mean the more instruction you give obviously better but
00:26:56.780 | it's able to understand how to navigate a website that's what the model is trained on so if you say
00:27:00.620 | you know go to this sunglasses website it doesn't it probably wasn't training a specific sunglass website
00:27:05.180 | but it can understand that button support you know if this buzzes click a ticket so it understand kind
00:27:10.060 | of the general knowledge of how to navigate the website but if there's something very intricate
00:27:14.140 | about that website you're going to have to encode it in the text like make sure you click
00:27:17.740 | button x first or whatever so it understands how to navigate websites got it and does it understand
00:27:23.500 | when it's failed yeah sometimes sometimes i've seen it sometimes get stuck in a loop and like oh no
00:27:29.020 | i keep scrolling i keep scrolling i keep scrolling it doesn't know when to stop so it again this isn't
00:27:33.180 | research preview so things are getting better the model is getting updated behind the scenes but
00:27:37.420 | it's not like it's not agi so that's got it and one last question um how is it in terms of navigating
00:27:44.540 | like distrustful parts of the internet i mean there's a lot on the internet that we see and we know is not
00:27:49.420 | to be trusted or it's something not to be followed how have you sort of worked around that problem yeah
00:27:54.620 | because again it is a model in the background so it's going to understand like if you're doing something
00:27:58.700 | it's not going to want to click that or might be there's safeguards in place so that's built into
00:28:03.820 | the model but again uh it isn't research preview you still have to explicitly say what buttons to press
00:28:09.100 | for certain actions but again the model it is an lm train it's going to be able to understand the nuances
00:28:14.780 | and say if it can't take this action or can't do that that could happen but i haven't seen that use
00:28:19.580 | case but if you keep pushing it maybe you'll find those those things well the thing i had in my mind is
00:28:23.580 | like if you go to a site where you have to download a link sometimes there's an ad that says download
00:28:28.860 | a link and you know that that's just an ad trying to get your attention of course would the model know
00:28:33.100 | or is that some yeah if like for example like in the the amazon.com it shows an ad for something but
00:28:39.020 | i said find the first thing was able to scroll past that ad and click something so the model understands
00:28:43.180 | the task you give it so yes it can understand that thank you all right so this this is finished yeah
00:28:51.420 | that's really quick it showed you got it was able to find all the models give me the size the rating
00:28:55.820 | the price reach of the monitors so again it executed that on parallel it got me the nice information
00:29:01.420 | and that that's kind of the idea of like it can do parallel execution in the background so you don't
00:29:05.740 | have to wait for it and don't see it actually clicking through the the task and you get your information
00:29:10.060 | all right one more question then we'll move on to the mcp part so nova is specifically meant to be used
00:29:18.060 | with the browser correct uh so nova act so amazon nova is a family of models on amazon so if you go to
00:29:26.220 | this website nova amazon.com you see there are different foundation models like nova pro premier
00:29:32.540 | light micro these are like the text understanding models so like your typical llm calls there's also an
00:29:38.940 | image model called nova canvas can generate images that the video real called nova real where can generate
00:29:45.100 | videos and that it's also a speech model text speech to speech called nova sonic so nova is a foundation
00:29:52.140 | of found uh foundation models by amazon to do all these type of tasks and act is just another one for
00:29:57.820 | browser automation are there plans to expand this like beyond the browser so that we can someday take
00:30:04.780 | actions in slack or ide or anything outside of the browser maybe some of the team is here so maybe talk with
00:30:13.020 | them later thank you all right so i'm going to move on to the mcp part vanjo yep nova act is only available in
00:30:24.540 | u.s yes right now nova act is only available in the u.s it's in preview so it's just getting started so
00:30:32.140 | if you log in from like a different uh account like address like uk or something might not it won't work so
00:30:37.580 | it only works in the u.s at the moment yes all right one more question over there and then i'm going to
00:30:45.260 | move on if you live in three monitors um i got the same results as you did but i actually got a different
00:30:50.540 | price with the samsung odyssey why do you think that might be oh
00:30:54.060 | your amazon.com is different i don't know yeah because it is opening up a different browser so it could have
00:31:03.580 | clicked something differently yeah so yeah that's right
00:31:07.980 | we can actually look at the video premium or video playback to see what the results were like
00:31:14.140 | one more okay one more quick quick one are there plans to support persisting browsing data such
00:31:23.980 | as cookies in the cloud browser so right now it's opening up its own browser but you can also set like
00:31:30.060 | your own like chromium profile and open up that browser so all the thing you have saved there like
00:31:34.940 | you want to log into your stuff you can set your own custom browser but right before it opens up a new
00:31:39.420 | like completely new browser without anything saved
00:31:41.900 | all right so i want to show uh i actually made an mcp server for nova act so a module tool is going through uh mcp
00:31:53.660 | and i can kind of show you what i did for the mcp server uh in fact we can use the amazon q here so i'm
00:31:59.740 | going to ask it uh can you tell me about the nova act mcp server
00:32:14.620 | can you tell me what it does what it does it does and oops
00:32:20.380 | so tell me about the nova act mcp server
00:32:25.820 | so you can see it's going through uh integrates nova act browser at mcp it has the browse session tool
00:32:34.060 | browser action execute parallel tasks take screenshots close browser list results so i created these different
00:32:40.380 | aspects of the mcp server so i could use something like claw desktop or cursor or amazon qcli to that
00:32:46.780 | say you know open amazon.com and find information for me so it's it's portable it understands uh so i
00:32:53.020 | don't have to actually write code i can say so go to amazon.com and find me the cop the first coffee maker
00:32:58.060 | it'll actually write all that code i did in the initial one to do that or the multi-monitor so i wrote a
00:33:03.660 | bunch of code to do this if i just said you know get me these three monitors to get the price it would
00:33:08.460 | actually write all the nova act code it needs to do that using the mcp server so that's kind of the
00:33:13.100 | power of mcp that i just describe a task and then i can encode the actual browse action things it needs
00:33:19.100 | to so and then i also made an mcp client that can actually interpret that so oops it connects to the
00:33:28.140 | mcp server it runs the code and is able to use query bedrock uh i am using a model so i'm using claude 3.5
00:33:35.660 | sonnet here because it's an mcp client and needs to have an lm behind that and then it's able to
00:33:40.860 | you know understand which tools to use uh run the code and open up the browser and whatnot so let me
00:33:47.100 | just run the example here so module two
00:33:55.100 | so er open the file just did that we asked amazon q to explain the file to us and now we're actually
00:34:04.540 | going to run it so python 3 and then i can open this up
00:34:23.820 | okay so let's be adventurous if somebody give me a query to try since anyone has an idea i'm going
00:34:30.940 | to just ask it and do something so someone give me an idea of what to run another act fix wi-fi
00:34:37.340 | how would you fix the can you find a website to fix website can
00:34:46.780 | fine fine let's see website to fix wi-fi use headless mode
00:34:54.460 | i spelled it wrong but let's see
00:34:59.900 | all right it goes to google.com how to fix wi-fi problems troubleshooting guy in the box and press enter
00:35:12.540 | return a list of the website title descriptions all right it's going through that so it open
00:35:19.100 | google.com how to fix wi-fi problems i see an empty search bar where i can type queries for search
00:35:25.420 | information i should type how to fix wi-fi problems so you can see it's understanding what to do
00:35:29.740 | you know oh it hit a recapture page so okay the search results are not available blah blah so
00:35:38.140 | so see it looks like it got stuck in a recapture page so this is like a headless agent so someone
00:35:42.220 | asked a question about going to pass capture and whatnot you see that it's it got stuck doing that
00:35:46.220 | it looks like it's stuck in a loop now so it sees the capture again so i should skip the clip button to
00:35:53.740 | skip the capture window the capture is still open so it's probably going to be stuck here unless i close
00:35:58.620 | it so you can see there are limitations it's not going to pass captures and whatnot but that's that was a good
00:36:03.580 | query to show that it oh did it fill it it's still open so it's going to be stuck here so i'm just going
00:36:08.300 | to close it out but you can see you know it can't pass everything it can't navigate through websites so
00:36:13.260 | something like that will wasn't it will not work so that was a great test example to show
00:36:17.820 | if i use the the baked in one you know find that copy made under 50 dollars it'll be able to go through
00:36:24.860 | that and use headless mode but any questions on that seeing how the mcp server is working i didn't have to
00:36:30.380 | write code i just said do something it actually wrote the code to do it for me
00:36:34.220 | question over here
00:36:37.500 | yeah yeah so a question about if i can actually go into the browser and do it myself yeah if i ran this
00:36:46.060 | locally on my machine it will actually be able to you know open up the browser and i can have to click
00:36:50.540 | the button and it'll continue doing that right now i'm running it within the browser so i'm everything
00:36:56.220 | everything in headless mode so we can't interact with that
00:36:58.540 | so you can see it's able to find search under 50 dollars it can actually look at the website
00:37:06.940 | it's found search results on amazon.com so yeah so that for that use case where we're not passing
00:37:12.860 | captures is able to continue and find the information there
00:37:20.300 | so a question about can i actually order something if i use my own browser session and like logged in
00:37:25.100 | at my amazon.com account and said yes order this for me you know click through it will be able to
00:37:30.140 | understand that thing but i would have to put it in i would have to use my own browser session so i
00:37:35.020 | i wouldn't want to log in by myself so yeah
00:37:39.820 | another question
00:37:44.460 | if you give novak the authentication for amazon for example like you give it your login details
00:37:53.100 | then can it log in and complete that action for you yeah but if i say this is my username this is my
00:37:58.860 | password enter that into that field and you'll be able to understand you know this is a sign-in button and i
00:38:03.820 | have this information but again this is all python code so yeah you can encode it you can make it an
00:38:08.380 | environment variable so it won't read it directly so a lot of ways to do that does it also like
00:38:12.540 | understand 2fa let's say it asks you to go to your gmail and you will then open the gmail website check
00:38:19.180 | the email if you're logged in again on your session and then input it or is it okay well if there's no
00:38:24.380 | capture like we just thought of the capture yeah so there's no nothing blocking so but yeah again nova act
00:38:29.820 | is free to use so there's a lot of creativity in this room so i think we should have like a
00:38:33.500 | nova act hackathon i think that'll be you know do something crazy with nova act
00:38:37.260 | all right so one more question yep one more can i book a flight when my price alert is less than
00:38:48.620 | hundred dollars it's like a continuously check you probably use something else for that but yeah i mean
00:38:54.060 | no back and open up that website it can just have a query every day you know open google flights and look
00:38:58.940 | look at the quickest thing and if something is below this dead hold you know send me an email so again
00:39:03.580 | this is all a python script so you can set up something that triggers like once a day like in a
00:39:08.300 | lambda function and so yes totally possible so nova act is very flexible and because it can run in headless
00:39:14.380 | mode you don't need to have that ui so that's really what makes it helpful for interacting with websites
00:39:19.020 | that don't have a native api
00:39:27.740 | thanks yeah this is pretty cool i'm a little bit confused like we have the nova sdk sdk api key and
00:39:37.500 | we're also doing some stuff in bedrock ah yeah so how does this actually work yeah so in that the nova
00:39:45.020 | api key separate but for this mcp client i did it actually needs a large thing with models to understand
00:39:51.580 | what's still happening so if i go to claude oops i actually said i'm actually using a claude sonnet
00:39:59.820 | 3.5 for my mcp server so that's how because i just asked it you know find that website for me
00:40:05.820 | how does it know that about any of the code doing that so it's using a large language model underneath
00:40:10.780 | the hood to actually find that information so that's where we use bedrock for trying to find it in the
00:40:17.020 | code but it's on it yeah i set the model id so your assistant you're an ai system helping you have tools
00:40:24.860 | you're using cloud 3.5 sonnet you're making an api called a bedrock whenever something happens so
00:40:30.380 | that's where the the llm we're using but nova act is separate from that so this m select of using you
00:40:35.820 | know claude desktop it's running an llm in inside of that they would understand that for the mcp server
00:40:41.500 | a question here a question
00:40:47.340 | uh the question is uh does it integrate with browser plugins as well like could it integrate
00:41:00.380 | with last pass if you have the last class plugin fill in the credentials through last pass and then
00:41:05.020 | continue i haven't tried that but again it does you can set up to use your own browser so if you do that
00:41:10.220 | and if that's integrated it might be able to do that and click through that but i have not tested that but
00:41:15.020 | something to try out thank you the biggest problem you will face is two factor like even if you gave
00:41:22.380 | it a password like if you're using something like google authenticator or something that would be like
00:41:27.180 | the biggest problem to capture but other than that if you provide it an environmental variable or if you
00:41:32.380 | give it instructions on how to access last pass in the browser you should be able to do it all right and
00:41:38.860 | uh oh one more question then we'll go on to the last module
00:41:46.220 | so clearly there are a lot of different uh agent architectures you could use um and what i can
00:41:58.300 | imagine using this is uh like you have a coordinator agent set up somewhere that's running in the overall
00:42:04.860 | app and then when something pops up and says hey you need to go and look this up online go and check it
00:42:10.220 | uh it should mod so my question is how modular
00:42:14.220 | i mean it's just python so it should be pretty modular right is that the way in which you're
00:42:19.740 | imagining the architecture to be is just if i was coding a coordinator agent in lang chain or lang graph
00:42:26.300 | for example it would then call your sub-agent and get and and run it stuff and then get and then get
00:42:33.100 | a text-based output that i throw into my message queue that's how it all integrates together is that right
00:42:40.220 | yeah that's one way you can do it so nova act again right it's a python so it could be a tool it could be
00:42:44.620 | an api call and the next module we're actually going to show you how to actually make an agent from that so
00:42:49.260 | good good t up right here uh so um juan talked about the strands uh at the beginning so strands is a new
00:42:57.100 | agentic framework launched by uh aws so let me open up the link uh it's easy as a pip install strands
00:43:04.540 | and the first agent is like agent equals that so it's very it's a model first uh way of interacting with
00:43:11.500 | agents if you use a lot of agent frameworks in the past there's a lot of bootstrapping and making sure
00:43:16.460 | everything is correct and like but that was necessary for kind of the older models like
00:43:20.380 | if you think back to like like llama 2 for example like how how far our models have evolved since then
00:43:25.740 | so but now we we can pass a lot of the you know bootstrapping we did previously the agent can figure
00:43:31.900 | that out so we don't need all these very uh heavy ways and like you know make sure everything's typed
00:43:36.940 | and every so whatnot so here's a very simple example of how i actually spun up uh and also it has mcp
00:43:44.060 | native support so in this example i actually have two mcp servers uh i have the aws documentation and
00:43:51.500 | aws diagrams mcp server so if you go to this like aws labs mcp these are the official um aws mcp servers
00:43:59.980 | and there's a bunch of different ones from like a cost analysis nova canvas diagramming cloud formation
00:44:06.460 | lots of different ones here uh so again it's all on github aws labs mcp but the example i do here is
00:44:13.900 | i'm actually i made like a solutions architect agent your role is to help customers understand
00:44:18.780 | this building on aws and i define these two mcp servers here i give it a prompt and i say this agent
00:44:27.020 | has all the tools in the mcp server it has a bedrock model i'm using cloud haiku here and what's cool
00:44:33.740 | about strands it can also use like light llm and olama so it has access to lots of different things or
00:44:39.340 | you can run it locally and of course it has access to amazon bedrock so that's what we're using here
00:44:43.500 | so all those three things makes the agent the tools the model and the system prompt and then i can say
00:44:50.540 | get the documentation for aws lambda and create a diagram of a website that uses lambda so let me run this code
00:45:18.220 | okay so it uses uv to install the mcp server locally a lot of people i don't know where does mcp run this
00:45:25.100 | is running locally but there are other ways to run it like in a lambda function and whatnot but for just
00:45:29.660 | testing it out it pulls down the the mcp server locally and runs it you can see it's already executing
00:45:35.260 | so let's make this a bit bigger uh so it says okay i'm going to help you with that first i'm going to search
00:45:45.020 | the aws lambda documentation i'll read the documentation then i'll create a diagram
00:45:49.260 | illustrating a static site so you can see it does a post request to do the search so the mcp server
00:45:54.460 | defines where everything is i don't have to like feed it in the well-architected framework the aws
00:45:59.900 | documentation is always updated so it just knows call the search function it got the lambda welcome file
00:46:05.180 | it put that in it's able to generate the diagram it generates the diagram it tells us what is going on how
00:46:12.540 | the workflow looks like it tells me it saved the diagram to this location i can open it up generated
00:46:18.220 | diagrams oops and now it's very small let me see if i can make this bigger
00:46:26.060 | there we go so i was able to generate the diagram for me so all through that about uh you know 40 lines of
00:46:35.820 | code i have two mcp servers i have my prompt and is able to understand that get that and just generate
00:46:41.340 | something for me with that uh so that's very easy to get started with strands of building agentic
00:46:46.700 | workflows i know agent means a lot of different things to different people but you know if you have
00:46:51.580 | tools the model the system prompt do some type of action and strands makes it extremely easy to do that
00:46:57.820 | if i use other frameworks it could be a lot more code to do something like that especially integrating mcp
00:47:02.940 | natively like that i'm going to pause here for any strands questions
00:47:08.300 | there's coming
00:47:15.020 | um i know bedrock already had its kind of agents sdk so is strands replacing that or is this now the pro is
00:47:25.580 | this replacing that or is supposed to complement that like is this the preferred way of creating agents with
00:47:30.620 | models in bedrock yeah well when it comes to preferred way it always comes down to your use
00:47:34.860 | case so the bedrock agent has a lot more i guess opinionated ways to do things as you can do through
00:47:40.220 | the console it has a built-in support uh right there in aws well strand is more it's an open source
00:47:45.900 | framework so you can download the code you can use other models through that like light llm or llama
00:47:51.100 | if you use bedrock agent you can't run that offline so there's different use cases different developer
00:47:56.220 | tooling i mean me the software engineer i like you know code first doing things so it does
00:48:00.540 | depend on your use case what you're trying to do in your experience okay can you show the code real
00:48:04.700 | quick yeah yeah this is the code yeah just show the agent so this is an open source framework if you go
00:48:12.940 | where it says agent you and it says model right now we're using a bedrock model but you can use another
00:48:18.140 | model with light llm yeah so you don't need aws at all in that instance
00:48:23.420 | you can use all llama you can use open ai you can use right
00:48:26.940 | yeah so there's documentation and topic light llm a lot of different model providers
00:48:37.660 | olama open ai so it's an open source framework so you can use it whatever you want so but yeah that's the
00:48:43.100 | idea of a strand open source model agent development kit one question suppose i want to build a text to
00:48:50.860 | sql agent and i have say 15 tools already built in that i want this agent to be able to use if i use this
00:49:00.060 | framework how can i make sure that the agent know when to use the right tool and the sequence yeah great
00:49:10.060 | question uh so i didn't this example i have a weather agent so one thing you said you already have tools
00:49:17.580 | what i like about strands a lot is i can write a python function i already have and let's put this tool
00:49:23.100 | decorator and that's it you know you don't have to put anything else it understands this is the uh what you
00:49:27.900 | need to do and then when i'm going to that agent i have this tools and it has put in the the native tools
00:49:33.900 | we're going to be using http request is a standard tool in the strands framework so in this example i'm
00:49:40.220 | like asking what is the weather in seattle and then also how many words are in this response
00:49:44.780 | this is open api api weather.gov where you don't need an api key and it can find the information for
00:49:51.340 | you so i'm going to update this san francisco and this started wrong but it's to figure it out a
00:50:01.100 | weather example where the word count and i was very specific you know find the weather first and then
00:50:06.300 | how many words are in the response so it's able to use that tool it gets the forecast and then it knows
00:50:11.340 | to use that word count tool next so we're passing a lot of the information to the model the models are very
00:50:16.940 | smart now we don't have to say do this do this do this the let the agent figure it out that's kind
00:50:21.500 | of the role of the agent you give it the context and tools necessary it figures out the best way to
00:50:26.300 | solve the problem but then wouldn't it be prone to hallucination when you give it 20 tools and then
00:50:32.300 | because we've tried that with aws no the similar things when you bind more i think more than 10 tools
00:50:41.020 | it's going to sure it is always you know a balance but i'm again the models are much better like try
00:50:47.180 | using cloud force on it is it hallucinating as much like these newer models are much better for
00:50:51.580 | understanding the concept and understanding what tools when the older models sure they get confused
00:50:56.620 | there's so many things but i'm very confident on these newer models they can understand your use
00:51:00.860 | case and what tools available and figure out the best way to solve the problem so then with this framework
00:51:05.340 | there wouldn't be a way for you to orchestrate a customized flow but more like you give the
00:51:12.620 | control to the agent you could if you want to have like specific like do this specific way uh there are
00:51:19.020 | different ways in strands uh with something called workflow mode where you actually say uh you know this
00:51:26.060 | is the workflow i want to do research results analyze things write the final report if you have to do
00:51:31.260 | something very sequential a strands has that i won't have time to go through all the different you know
00:51:36.540 | ways to do multi-agent collaboration and whatnot but this for that specifically like i wanted to do x y
00:51:41.900 | z first the workflow away can do that so yes then is it possible say um i i don't have a predefined
00:51:50.620 | workflow but i know it needs to figure out the right workflow then then that's what i just did there
00:51:56.140 | you know i just gave it a sentence and figured it out i see yes okay perfect thank you
00:51:59.580 | so cloud four has something called interleave thinking i believe that's what it's called where
00:52:08.140 | it can handle multiple tools processing much better than most models today so if you're passing in 20
00:52:14.860 | tools it's able to work through the agentic loop to really figure out which tool to run and it's also
00:52:21.340 | able to run parallel tool calls so rather than just say okay here's the objective let me run this tool it
00:52:28.860 | can say here's the objective let me run this tool this tool this tool this tool and this tool and then
00:52:33.340 | process the results and determine when needs to happen next so i would try a cloud for which he like
00:52:39.660 | banjo mentioned then last example really quick uh again you know strands i made my nova act mcp server
00:52:48.220 | and it can actually run that you know i had to find this is the mcp server use the nova act mcp you know
00:52:53.900 | use the cloud so same type of thing i could have another agent you know use uh nova act as well
00:52:59.500 | so strands make it very easy to build these agentic workflows uh so i really enjoyed the developer
00:53:06.220 | experience of using strands and you know i already have the mcp server we see the same exact example
00:53:11.500 | before so once you have the mcp server it's very easy to plug in into different uh architectures and
00:53:16.860 | strands makes it very easy to accept that but yeah those were the three modules really about how to use
00:53:25.980 | strands uh mcp then amazon nova act again strands is open source you can download it pip install strands
00:53:35.260 | if you just type strandagents.com it'll take you to the documentation uh again also nova act
00:53:41.340 | nova amazon.com it's free to log in and then i think that's all the time we have but we do have a
00:53:50.300 | survey uh and you can get aws credit code by filling out this survey so banjo i have a question about that
00:53:57.100 | workflow thing in strands when you create these individual agents can you define which tools are
00:54:03.660 | passed on to each agents yeah yeah it's a great question dark about different agents uh running out
00:54:08.220 | of time but i'll quickly show uh i have a multi-agent example i believe
00:54:13.340 | oh i think you had it in the docs yeah yeah it's in the docs yeah yeah yeah each of these is a different
00:54:21.980 | agent so you know this is an agent you can have a different system prompt you have different tools so
00:54:26.700 | you're just defining the agent and then yeah you can have different tools different whatever there
00:54:30.540 | different models and then the workflow would just call that so yes completely customizable so that's
00:54:35.420 | the good thing about strand it's very easy to customize and build uh scalable solutions like that
00:54:39.580 | thank you and then again uh here's the survey you can get aws credits for filling out this thing
00:54:45.820 | tell us how we did what you liked what you want to learn more and now go build
00:55:03.260 | yeah any other questions i think we have a minute
00:55:12.860 | thanks for the presentation um so as these systems develop i think that it's reasonable to assume that
00:55:21.900 | they would emerge as an increasingly effective vehicle for committing fraud online at scale which would push
00:55:29.740 | businesses to implement more things like captcha which kind of decreases the surface area that
00:55:35.580 | tools like this would be applicable so what is the long-term strategy for that well you already saw we
00:55:41.660 | failed with captcha today like you know we're not trying to back capture we're not trying to break things
00:55:45.820 | you know a responsible ai is very important to amazon so no we're not trying to let this tool commit fraud
00:55:51.100 | you know you have to have an api key so it could be monitored so use cases like that will be shut down
00:55:56.220 | we're done
00:56:00.540 | i think we're done yeah so thank you all
00:56:05.180 | i think it's finished
00:56:09.260 | oh we can keep going we have more time oh the clock the clock ran out so i thought we were kicked out
00:56:19.980 | all right well more questions then i guess i thought yeah another question
00:56:26.940 | so regarding nova act let's say that i have a headless browser in the cloud is there a way to connect
00:56:42.540 | nova act to my custom browser instance in the cloud yeah yeah yeah you can there's a way to like put your own
00:56:48.140 | browser instance so yeah novak supports that so oh pretty possible yeah thanks
00:57:00.700 | all right let me go to novak github page
00:57:07.100 | and it's just some examples there
00:57:17.180 | i think right because it says start at one and then you have 120 minutes
00:57:20.700 | i know i just told him okay but i think what happened that time
00:57:23.340 | oh yeah no you guys can keep rocking it okay yeah
00:57:30.060 | so yeah there's a way to set up your own user agent for nova act so definitely possible
00:57:44.220 | there's a lot of time so i don't know if anyone actually got into the workshop so we can still
00:57:58.780 | build some stuff or i can try some other examples
00:58:11.260 | that's a lot of time so we can try to make a streamlit app with nova act so we can try that one
00:58:16.300 | so we can try to make a streamlit app so we can try to make a streamlit app with nova act so we can try if that works
00:58:21.340 | so we can try to make a streamlit app with nova act so we can try to make a streamlit app with nova act so we can try if that works
00:58:29.420 | oops oops
00:59:02.380 | so one example i tried i tried to make a streamlit app that uh look for like the top five uh playstation games on game faqs and then create an image like a nice graph for me
00:59:16.460 | but it can't fail so uh i think that's one of the issues there i think it failed at one of the steps
00:59:22.140 | there uh let's see oh that nova act got an error so it couldn't navigate gamefaqs.com so it does it does
00:59:33.660 | fail at some of the things so that's you know again research preview you have to be more specific on how it goes
00:59:39.020 | through things uh oh yeah let me show you where the code is just so you can have an example let me pull up
00:59:45.500 | the code
00:59:45.980 | yeah let me try let me set up my local machine so we can see how it works
00:59:55.500 | oh yeah go for it does nova act depend on like uh semantic html and like good web design to actually
01:00:12.700 | work i mean it understands the actual page so it can click through those things but if the if the page
01:00:17.900 | like doesn't have like a search box or button and not be able to navigate so as long as the page it can see
01:00:21.900 | the page it can see the page understand where to click and then click those correct buttons so
01:00:25.740 | i get maybe a follow-up is there any like efforts to do like experimental like engagement on the page
01:00:33.100 | so if it comes on a page that it's not familiar with maybe it would try and act like a human would
01:00:38.380 | to like click on things or try things out depending what you you put in that prompt because again you're
01:00:44.060 | creating that workflow what it should do so if you say you know explore this website and find thing
01:00:49.020 | they will try to click through that but again it's up to kind of what that initial prompt is that you
01:00:53.820 | have for it yeah when you're using over act you're kind of giving it step-by-step instructions when you're
01:00:59.260 | using the sdk so that way if you kind of know it's an obscure website you can give it those instructions
01:01:06.540 | that it needs to perform rather than the mcp server is using natural language to infer what needs to be
01:01:13.660 | done so there's not specific instructions coming from you unless you provide it
01:01:27.100 | yeah so i'm going to run it locally on my machine just to show an example let's see
01:01:32.140 | oh let me hide my key for a second because it's been recorded
01:01:43.820 | all right
01:01:54.860 | pipe on get coffee thank you for coming
01:01:59.900 | all right so i'm just running it locally on my machine so without headless mode so you can see
01:02:07.740 | it opens up the browser
01:02:08.860 | it's able to type copy maker
01:02:17.180 | so what we're looking at now is not in headless mode this is actually nova act actually
01:02:24.620 | performing the task in the browser so yeah a lot of questions about how does it work you know
01:02:30.220 | and we can try more complicated examples i just wanted to show it can work on your machine
01:02:34.700 | and you can see the log you know i'm just looking for and if i like change the page while just doing
01:02:43.420 | something it's going to like mess up so i'm going to click the page and see what it does like so someone
01:02:47.580 | asked about click things of that nature what's it going to do now
01:02:54.380 | so see it crashed now because i brought i changed the different page didn't know what to do so
01:03:03.740 | examples you can interact when it's when it's going through the motion as well and then
01:03:09.580 | i believe i have an uh consider the mcp server i set up a clod instance
01:03:24.940 | and then i have a my nova act mcp servers there so i'm able to actually you know i click this you can see
01:03:32.060 | all the tools it has available so i can ask it to like navigate a website so
01:03:36.860 | anyone having a complex example you can see the mcp server so i know some people have been asking some
01:03:45.260 | complex examples so go ahead and give me one yeah you got you got one one one one question i had is uh
01:03:53.820 | can nova support like drag and drop functionality you can try it do you have a specific website that
01:04:01.820 | that has like drag and drop
01:04:16.220 | draw io uh let's go to draw.io and make a cool diagram use nova act
01:04:27.900 | i want to see what happens
01:04:32.700 | all right so let's go to draw io all right it opened the page
01:04:45.660 | do i have to accept something nope it's going oops all right open dry let's see
01:04:50.380 | and then i'll make this smaller
01:04:56.700 | wait for page to load look at my initial setup for template selections all right it's going
01:05:08.860 | oh it crashed what happened oh do i have to allow allow always i took a screenshot
01:05:18.460 | i need to continue the browser session to see what's available let's look at the screenshot all right it's
01:05:31.180 | it's opening up again uh it's going to draw io
01:05:36.300 | yeah if i keep clicking away it clicks back to the die the browser session so i need like two monitors
01:05:57.180 | let's see let's see let's see is it going to figure out how to use draw io
01:06:00.620 | wait for page take screenshot look for template options come a blank paper all right it's
01:06:08.860 | so it's kind of i didn't give it any specific instructions i just said make something cool so
01:06:14.060 | maybe that's too hard to interpret for this website maybe i have to say click this click the square button and
01:06:19.820 | then drag the square to the center or something i might have to be more explicit for that
01:06:24.220 | so it seems it seems to have frozen all right it's clicking something all right click new
01:06:32.540 | oh okay it's doing stuff
01:06:43.420 | again it's not like super real time it's going it's not like instantaneously but it's it is clicking
01:06:48.540 | through the buttons clicking through stuff all right did it do anything oh the claw
01:06:56.460 | so it looks like it failed so yeah it looks like clod failed that one so i won't blame no for that but
01:07:03.420 | that's like that's idea so thanks for trying to do something hard
01:07:08.540 | okay another question back there oh yeah can we try another one yeah let's try another one sure can
01:07:14.700 | we do uh you know on google maps find the top three rated coffee shops with within a mile radius of
01:07:22.380 | this hotel top three coffee shops shops near the marriott marquis in san francisco
01:07:34.540 | you'll figure it out
01:07:41.980 | all right open maps google search mirror marquis san francisco wait for results to load so it has a
01:07:50.940 | plan it's going to do something so let's see it opened google maps
01:08:01.340 | all right let's have mirror marquis san francisco so it's able to type that
01:08:04.460 | okay it searched it found the mirror marquis
01:08:13.820 | so there's a copy button let's see if it clicks that i'm curious
01:08:23.420 | looks like it's frozen give it a couple more seconds
01:08:40.540 | whether to click it got this 15 minutes i was trying to type in that box okay
01:08:51.980 | all right
01:09:00.380 | all right it's typing coffee shops all right all right it's going
01:09:05.900 | all right so all right it'll open the coffee shops and let's see if we can get those top three there's a
01:09:18.780 | four eight four seven another four seven let's see if it can get that
01:09:31.100 | did it crash
01:09:39.660 | i think it did it but i think
01:09:44.300 | i'm gonna blame claude claude desktop might need a different mcp client
01:09:52.380 | i think yeah i think claude desktop doesn't like doing that but again because it's an mcp server i can open
01:10:01.260 | up a different mcp client so i can open like cursor for example and ask it questions through that
01:10:08.300 | cursor
01:10:15.980 | let me go this
01:10:18.300 | and then
01:10:20.620 | you see it has the mcp tools oops it has this up let me just open up a new one
01:10:27.340 | i can do the same thing and use nova act
01:10:37.580 | and then it's called the mcp tool again so that's the beauty of mcp i already have this server i can
01:10:46.780 | just use a different client it can understand all the information it needs to and do the exact same command
01:10:51.500 | so it's going to do the same thing cursor might be smarter than claude codes
01:10:56.140 | but yeah it's able to do the exact same type of thing so
01:11:00.380 | another question over here
01:11:04.780 | yeah i just got a question on the
01:11:07.100 | the nova act model yeah
01:11:08.620 | that model is that that is that running in the cloud
01:11:11.980 | yes so nova act question was where is nova act running and yes it's running in the cloud
01:11:16.140 | so yeah it's just you get that api key and it's doing the call behind the scenes in the aws cloud
01:11:21.100 | so then what what does it upload to the cloud
01:11:24.060 | well it's asking the questions and like you know go to google maps and then they would say i understand
01:11:29.260 | that and it's actually clicking those buttons and doing the actions so the actual uh intent of
01:11:34.940 | what you're trying to do in the specific action
01:11:39.100 | and then if i was if i was using it locally you couldn't use nova act locally it has to be uh connected
01:11:47.500 | to the internet to use it okay but if i for example though if i i wanted it to like look at my gmail
01:11:55.980 | oh yes
01:12:01.100 | ah yeah i see what you're saying yeah yeah it's you it's i mean it is you know it's an api endpoint
01:12:06.220 | it's been past the aws so you know only passing information that you feel like it's not going to be
01:12:10.620 | without training the data or taking any of that nature but it's going to the aws cloud and processing
01:12:15.420 | you know what to click on this button locally on your like browser
01:12:24.460 | so looks like it's not yeah see now it's even certain the rating it actually knows which rating
01:12:29.100 | to press so
01:12:31.740 | right yeah yeah well nova act is executing like in this mcp server example i say you know
01:12:43.020 | find the top three coffee shops in marriott near the marriott marquee and then i'm passing that
01:12:47.980 | information that the the llm is understanding that plan and then it uses nova act to interact with the
01:12:53.740 | browser because like cursor or cloud code or amazon queue they can't interact with
01:12:58.700 | the specific uh you know website by itself it uses it uses nova act to do that right but like given a
01:13:05.020 | question though like how how does it come with uh come up with the plan oh the mcp server like that the
01:13:11.340 | client so i picked the model in the example we had the mcp client we had as we showed the model
01:13:17.020 | right like the cloud 3.5 yeah that's coming up with the plan same thing here you know i asked you know
01:13:22.460 | help me find the top three coffee shop near the marriott marquee this the model that uh cursor is using
01:13:28.620 | is coming up with that plan and then i'm using the nova act mcp server to act on it exactly so this is
01:13:34.460 | the plan search for marriott marquee click the marriott marquee you know search for the things
01:13:38.300 | and you see all this information nova act return and actually it will return this time so i think
01:13:43.980 | the problem was with claude desktop but it got the three top three copy stops there right what are all
01:13:49.420 | the tools that uh nova act can do today uh so the mcp server is what i wrote so uh but the idea between
01:13:56.860 | nova act again it can interface with the web browser that that's the tool the browser is the tool and it can
01:14:01.740 | anything on the website you can actually click through go through the example etc i see
01:14:05.980 | you got the repo you got an architecture that showed the mcp just so they can see it yeah so i mentioned
01:14:15.820 | uh there's an official aws mcp servers so uh this aws labs mcp and a lot of different um mcp servers here
01:14:25.340 | for the one the nova act one i created my own one uh go back to the nova act examples or where do the
01:14:32.700 | ah here when i use amazon q to explain you know the mcp server for like what what's going on what tool
01:14:41.180 | was the browser session performing an action on the browser so this is a good uh thing to talk about so
01:14:47.580 | can you dive deeper on the browser action function and we can see because this is how it's actually
01:14:57.180 | acting so uh amazon cube browser action is designed to perform actions it has this uh what's cool about
01:15:07.340 | it it just does oops it's going to the code it performs a single action in the nova act browser
01:15:15.260 | so it's executing that action it stores this act.act is like what nova says you know click the search
01:15:21.740 | bar do this x y you know my the mcp client understands how to use this act.act it passes
01:15:29.260 | the correct action so we saw the example here one of the actions was like go to google maps or click
01:15:35.820 | this button or do that search that's how it's able you know these actions and then the nova act mcp server
01:15:41.980 | is translating that to actually click that button so the mcp server provides all the interfaces it
01:15:47.820 | necessarily needs so then these mcp clients can interact and do actions and do things yeah and nova act
01:15:54.620 | is just the model in the background that's able to click those buttons extending this question it so your mcp
01:16:03.340 | server so claude uh or cursor running locally right it's calling your mcp server that's also running
01:16:10.380 | locally is your mcp and your mcp server is the one that spun up the i guess the chromium instance yeah
01:16:16.220 | is it is your mcp server taking screenshots of what you see in chromium and shipping them to nova to nova act
01:16:23.900 | the screenshots are locally and then based on that like you can see it's actually getting all the
01:16:27.900 | information uh the final page information so it's not storing your screenshot data and sending that
01:16:33.100 | everything that it's running locally it's clicking those buttons based on what's on the browser sensor
01:16:38.140 | got it but is is any of any of the information in chromium does that any of that need to be sent into
01:16:42.860 | any no no everything running yes running locally okay distinction okay perfect thank you
01:16:56.220 | and let me open up the
01:17:09.900 | where was that looking so one of the things about making mcp servers is you have to provide a lot
01:17:18.540 | of context so uh for nova act like i say you know when writing active for nova action be descriptive of
01:17:24.860 | what to do you know click the hamburger menu icon go to order history don't find my order so the more you
01:17:30.620 | know uh concise and just prescriptive of what you want to do it's better you know search for hotels in
01:17:36.140 | houston so by average customer like so the better specific it is uh that's how the mcp's uh client
01:17:42.700 | is able to make those great requests and find the information so type copy maker search rock enter so
01:17:48.700 | the more prescriptive you are of nova act the better results you're going to be and i encoded that all into
01:17:53.260 | this uh mcp server so the clients can leverage that so i think that's probably one of the hardest things
01:18:00.540 | about making the mcp servers that's making sure you provide the next context of when to use the tool how to use the
01:18:05.980 | tool the inputs and outputs but once you solve all that it's very easy to plug and play the different mcp
01:18:11.740 | clients like we've done here
01:18:39.500 | question yeah
01:18:52.140 | right so when nova act is doing something it's passing back the log of everything it's doing so you
01:18:57.500 | know what what steps it did so the starting page the add the results the action result id so it's keeping a
01:19:03.420 | of log of everything it did uh power so it's able to get that json i understand what the id what the
01:19:08.860 | result is so you can see what it's doing so it can move on to the next step yep
01:19:21.180 | other question
01:19:27.900 | sorry a quick question is this able to do uh like uh automated ui testing because of this
01:19:35.100 | well with nova act you know you can define like what you want it to do so you're going to have to define
01:19:40.940 | you know go to this button click this does this work so you can define that workflow so i mentioned
01:19:46.460 | before like back in the day like if i'm writing selenium code i have to go click this h1 tag do
01:19:51.260 | this like now you can just write a natural language you know click this button click that button so yes
01:19:55.580 | it can handle that use case uh specifically of like opening the browser or checking these things and
01:20:00.300 | but you have to like you know this nova act search for coffee maker you know you have you specifically have
01:20:05.580 | to write what buttons to press yeah thank you
01:20:12.860 | let's see i guess we have time i can show some multi-agent collaboration with strands that could be
01:20:25.020 | something cool uh i think i have a repo for that so should be uh go to the aws labs page where's that
01:20:37.900 | let's work
01:20:39.980 | and then claude
01:20:49.340 | okay i'm just gonna copy this code and put it into our environment
01:21:14.700 | so in this example i'm actually going to show how strands as multi-agent collaboration so one way
01:21:33.580 | i'm actually going to create a powerpoint presentation based on uh you know a cloud migration request i want
01:21:40.220 | to like move my uh infrastructure on premise to the cloud give me a presentation of how i would do that
01:21:46.620 | and so for this i created three different agents i created a cost analysis agent so i have a system
01:21:52.540 | prompt there a solutions architect agent does a map out what you're going to be doing and then each of
01:21:58.780 | these uh tools is an actual agent so this uh cost analysis has the docs mcp server the cost analysis mcp
01:22:06.940 | server it has its own prompt the presentation agent has its own system prompt it has a tool from the
01:22:13.100 | a powerpoint mcp server that i'm using and there's an architecture agent it also has you know its own
01:22:19.180 | specific tools system prompt etc so different agents for different uh things in the workflow and then i have
01:22:27.900 | this orchestrator agent welcome the migration orchestration agent it has a prompt and i tell
01:22:32.540 | what tools it has access to and then the cool thing with strands is i make this orchestrator agent and
01:22:38.620 | then the tools are just other agents in that so it knows when to call this agent for this particular tool
01:22:44.460 | when to do that and i say you know i want to migrate my work my uh workload so write the fight tools to find
01:22:52.940 | that so i made a fictional company called shop easy e-commerce they have on-premise java my sql database
01:23:00.140 | yeah i want zero down from migration like all this all these little constraints in there and i wanted to
01:23:06.060 | make a migration plan and a powerpoint presentation that i can present to my executives of how this would
01:23:11.580 | work and i just designed and i'm the orchestrator agent will find out what to do i don't specifically say
01:23:18.060 | do this one first do that first we'll let the the agent figure that out so let me run that strands
01:23:26.540 | and it should be multi-agent
01:23:33.020 | all right so cloud progression agent as tools all right again so all the mcp servers running locally
01:23:42.780 | it downloads it's using the ux it starts with the architecture design first generates a diagram
01:23:54.940 | i'm going to use waft so take some time it might fail but it will just update update itself
01:24:00.700 | making another judgment
01:24:03.420 | all right i think it couldn't generate the diagram there but it's saying all right i'm just gonna this
01:24:17.580 | this is what the diagram should have this is what we're going to doing
01:24:24.140 | now i'm just going to do a cost analysis cost analysis on basically the things we did there so
01:24:31.500 | it's it's a this this workflow takes me a couple minutes to run but you can see it's calling all these
01:24:36.540 | agents uh different things it's understanding what to do what actions to take first it's finding pricing for
01:24:42.780 | eks because it has a cost analysis tool and knows where to find that information so it has the up-to-date
01:24:49.340 | pricing all the time funding for aurora for its database so it's able to understand all that information
01:24:55.100 | and get the real-time up-to-date information just because we have that uh pricing mcp server from the aws labs
01:25:02.460 | it's called cost analysis yeah cost analysis mcp server documentation all the stuff you need for
01:25:15.740 | finding the right price on aws it has all that information and the agent was able to just use that
01:25:21.420 | one that's going to generate report so it's still running again this does take a while because i'm
01:25:32.460 | asking a very complex question a lot of things going so it does take a couple minutes to run through all
01:25:37.660 | that it gets its monthly spend predictions monthly savings etc so they would understand all the information
01:25:45.260 | and get all up-to-date information based on the plan we provided and the last thing now wants to create
01:25:53.020 | an executive presentation so download the powerpoint mcp server and now it's going to make a powerpoint
01:25:57.980 | presentation based on that so adding the title slide so you know add a placeholder so generating powerpoint is a
01:26:08.700 | very popular use case and there's an mcp server that can go ahead and just do that add bullet points
01:26:13.980 | etc so give it a couple another minute or two