Building Agentic Applications w/ Heroku Managed Inference and Agents

00:00:00.000 | welcome welcome you all to this launch and learn I hope you are enjoying lunch

00:00:19.440 | right now that the conference brought you we are going to be talking about our recent product that

00:00:27.840 | we announced at Heroku which is the Heroku Manage Inference and Agents or part of the Heroku AI

00:00:34.080 | offering and we will be talking about the fundamentals of building authentic applications

00:00:39.360 | with this service this is not a session this is a workshop so if you want to follow along

00:00:45.720 | just bring your laptop you only need a browser you are not going to install anything in your computer

00:00:51.420 | I swear that is going to be easy to follow along also this content is going to be available for you if

00:00:58.860 | you want to continue learning at home and having access to the Heroku platform during the event and

00:01:06.540 | the weekend so initially we are going to do the setup while you get ready by signing up to an Heroku

00:01:16.780 | account or if you already have an Heroku account you can just get access into the platform I will

00:01:22.260 | give you a link for you to follow then my friend here is going to give you an overview of what we

00:01:28.620 | release and then we will go and do the hands-on my name is Julian Duque I'm a principal developer advocate

00:01:34.860 | for Heroku I'm here I am with Anoush yeah hi my name is Anoush I work as a product manager for Heroku AI

00:01:40.700 | thank you everyone for joining us very excited to talk to you guys as well as walk you through how simple Heroku is

00:01:48.140 | of course and let's get the started you can join the workshop Heroku AI channel in the AI engineer slack

00:01:57.580 | workspace there I share the slides and some links also this is the QR code that is going to take you

00:02:05.740 | through the workshop side so in the workshop side you are going to find a link that has the content

00:02:13.500 | and a form when you enter your Heroku account email if you already have an Heroku account just enter your

00:02:21.340 | email that you use to sign up to Heroku if you don't have an Heroku account go ahead sign up for one

00:02:27.580 | you don't need to put like any credit card information and with that form you are going to get access to

00:02:35.020 | Heroku you will be able to deploy applications use our services during the duration of this week

00:02:42.700 | if you have any problems with the sign up process let us know we are here to help and this is a follow

00:02:48.780 | along workshop what are we going to do is deploy a Jupyter notebook to Heroku and then we are going to

00:02:56.700 | load all of the workshop contents into this Jupyter notebook and from there I will take you with the

00:03:03.820 | workshop material so let's take a picture of this also make sure you join the slack channel there is the

00:03:09.900 | information in there and while you are doing the setup my friend Anush here is going to give you an

00:03:17.180 | overview of Heroku AI and what we release okay thank you for joining us friends how many of you are familiar

00:03:26.940 | with Heroku have you used Heroku before show of hands oh nice so to set the stage this is the most exciting time to

00:03:37.900 | be building right now and especially building with AI and this point has come before right there have been

00:03:45.180 | inflection points of technology that have fundamentally changed the way we build things there was the

00:03:49.980 | internet there was the cloud there were web apps and now there's AI previously when Heroku started

00:03:56.060 | we took a similar challenge people wanted to build and deploy their apps especially with Ruby on rails but

00:04:02.060 | it was hard it was hard to operate it it was hard to deploy it it was hard to scale it Heroku made that

00:04:07.500 | super simple with get push Heroku main and a whole new host of developers could easily build push and scale

00:04:14.060 | their apps we are doing a similar thing right now where we have taken on the challenge with AI we have seen

00:04:19.980 | that people want to build with AI they want to build agentic applications they want to build agents that scale and

00:04:25.340 | operate in a way that is very simple we want to make sure that every software engineer right now is an AI engineer

00:04:32.620 | and it is simple as attaching agents and AI to your apps so how are you doing that currently a lot of

00:04:40.860 | solutions are only for the day one problems but what happens on day two how do you operate it how do you

00:04:46.700 | scale it there are so many models out there how do you know this is the right model for your problem how

00:04:52.940 | do you know your tools are running safely Heroku has taken a very opinionated and curated sort of

00:04:58.060 | models that we believe work best for our customers and developers will enjoy we have expanded that

00:05:05.020 | folder by deeply entrenching these models in an agentic control loop that runs on Heroku that has access

00:05:11.100 | to tools like code execution access to your data all under the trust layer of Heroku and for these extension

00:05:18.060 | of agents we are using the model context protocol you might have seen online that people keep asking

00:05:24.620 | that who's going to build the Heroku of AI or who this is the Heroku of x so the question for who's

00:05:31.660 | building the Heroku of AI it's Heroku of course why wouldn't we okay so what are the challenges right

00:05:39.260 | now people are facing one of the things that i can see is that how do you how do you figure out that

00:05:45.260 | this model works best for you how do you know that it's evaluated and traced and has the right

00:05:50.860 | technologies to make sure that it is performing the way you want it to perform so these are the

00:05:56.060 | challenges that we're taking on we are curating these things such that whenever you work with the agentic

00:06:00.300 | applications you don't feel like it's a bunch of knobs and bells and whistles like a plane but it's

00:06:05.820 | actually pretty simple by taking on the opinionated approach of having those defaults for you you can get

00:06:11.820 | started with as simple as one cli command you just do Heroku AI models create and that attaches AI to

00:06:19.020 | your app as a resource moving on to the next slide of course okay so we have three major things that

00:06:26.700 | work great for our AI application or building agentic applications one is we offer primitives like inference

00:06:33.740 | where you can curate and take a set of models and access this in your apps we also have model context

00:06:39.740 | protocol to extend your apps you can build remote mcp servers on Heroku simplified way and you can also

00:06:45.740 | build standard i/o mcp servers that run in Heroku's trusted compute and they can also scale to zero so

00:06:51.820 | that you do not pay for things that you're not using and we also have had pg vector which is a great vector

00:06:58.380 | database for embeddings so all these together give you all the primitives that you need

00:07:04.060 | to build agentic applications so how do we do this Heroku has a trusted compute layer called dynos

00:07:14.380 | and these computers were the one that runs tools for you so example we provide first party tools like code

00:07:23.740 | execution that run on Heroku's compute and these can run on the compute and stream the data back to you and

00:07:30.060 | solve your problems we plan to offer our additional first party tools such as web source for grounding

00:07:36.940 | memory memory is really important right now so in the future we'll probably offer memory

00:07:42.140 | and other compute tools that can run on Heroku's compute and provide to you you can bring your own

00:07:46.860 | tools as well using mcp that can run on our compute and stream things back to agents okay with that i'll

00:07:54.860 | hand over to julian for a more hands-on workshop and you can build all the things i spoke about within the

00:08:00.940 | next 30 minutes okay let's go back to this last uh picture for the folks that just joined so you get

00:08:07.820 | access to the workshop content and remember we have the slack channel in the ai.engineer workspace

00:08:15.260 | where i already shared these slides and the link so you can follow along you can follow along with me

00:08:20.460 | right now or you can follow along at home with more time so what are we going to do is the following so

00:08:27.980 | let me show you that link is going to take you to this page in this page you have access to the

00:08:34.860 | workshop resources this is a website that has the instructions step by step of what you are going to

00:08:40.620 | be doing today i'm going to be following these same steps right here right now and second we have a way for

00:08:48.540 | you to get access to eroku if you already have an eroku account just put your email the email that you

00:08:55.340 | use to log into eroku if you don't have one go and sign up first you can sign up for eroku you don't need

00:09:02.540 | to enter your credit card information with just this invitation you are going to get access to the workshop

00:09:08.540 | these services are going to be enabled until uh the 7th like this weekend and i can extend it if you

00:09:15.980 | like the content we have for you this is going to send you an invitation to a team the the team is

00:09:22.460 | called ai engineer world's fair go accept that invitation and then you have access to this dashboard

00:09:30.860 | so from the heroku dashboard you go to dashboard.heroku.com you log in and you should get access to this view

00:09:38.460 | we already have like some pre-deployed applications here the workshop that i will be showing you the

00:09:45.900 | jupyter notebook the brave search mcp we are going to use a pre-existing mcp

00:09:51.900 | to give you an example of how to call that mcp from the manage inference and agents endpoint and i also

00:09:59.500 | have another application here which is the ai engineer data that has an eroku postgreSQL database

00:10:05.020 | and we are going to see how can we build agents that has access to this database as well but we start the workshop

00:10:14.140 | first by deploying an eroku application so we have this eroku jupyter template

00:10:22.300 | that you can deploy to eroku go to the repository after you have the account and logged into eroku

00:10:32.620 | just click to deploy click to deploy you are getting this page that will ask you for an application

00:10:42.620 | name make sure that this application name is unique and the internet situation is a little bit slow let's

00:10:50.780 | hope it gets better to show you the things working live otherwise i i need to start getting access somewhere

00:11:02.140 | if you join this please go to the slack channel it's called workshop eroku eroku ai workshop and in the ai

00:11:10.860 | engineer slack channel you can find the how to get started instructions if you guys are stuck somewhere

00:11:16.540 | or you don't know how to get started please raise your hands and i can come help you out beautiful this is

00:11:21.580 | the deployment page we are going to create a new application remember to have a like a unique name

00:11:26.700 | let's do this is jdukey jupiter this is going to be my jupiter it is not unique so if it is not unique

00:11:34.060 | you are going to get an error so let's do workshop to get a unique and for the app owner if you want this workshop

00:11:43.740 | to work without you paying anything make sure to select the ai engineer warfare which is the team we

00:11:52.620 | have invited you to these jupiter notebooks are password protected so there is an environment variable

00:12:01.260 | here for you to define the password to this workshop i'm going to use a super secure password lab

00:12:07.820 | and then click to deploy so this is going to deploy the application with a dino where the jupiter notebook

00:12:16.460 | is going to run the dino is pretty much the container unit where your application runs like the virtual

00:12:23.100 | machine on heroku and it also has heroku postgres so all the work that you are going to do in the

00:12:29.260 | jupiter notebook is going to get persisted into this database and this is going to take a little bit of

00:12:35.980 | time to to install this is pretty much fetching all the source code of this jupiter notebook

00:12:41.900 | template to heroku building that application and all of the dependencies and then you have these

00:12:48.540 | available for you to use so basically you are going to get something like this i already have this deployed

00:12:56.220 | but you don't have any workshop here so we are going to be loading that notebook

00:13:03.740 | and to load that notebook you go and keep following the instructions oh the third step that i missed is

00:13:11.020 | how can we provision managed inference may mainly how can we produce provision an ai model to an eroku

00:13:21.180 | application so manage inference is a service that let you run ai models within the same infrastructure where

00:13:29.580 | where your application is running so the data is not going to third parties if you are using let's say like

00:13:34.940 | open ai or anthropic apis the data is going outside your application this is keeping

00:13:42.540 | uh everything running inside the same network that your application is running

00:13:46.380 | and to provision these you can use the heroku cli but we are going to make things easy

00:13:52.060 | once the application is deployed this is taking time to deploy so let's go to something that it's already

00:13:59.500 | deployed let's say this one that i i already created before you go to your application in the dashboard you

00:14:07.500 | click to the management page then you go to resources

00:14:14.060 | and in resources is where you can provision add-ons add-ons like heroku postgres or the key value store

00:14:20.860 | or any other third party item that we support on our marketplace the one that we are going to provision

00:14:26.700 | is called heroku manage inference and agents this is our ai offering so you are going to click

00:14:33.020 | on this add-on and then you will select the model that you want to use in your application we support

00:14:41.740 | text-to-text models from entropic cloud 3.5 3.7 and 4 we support coherent bed for embeddings and we

00:14:50.140 | support a stable image ultra for image generation we are going to be working mostly with text-to-text

00:14:57.020 | with inference models so let's do cloud 4. we submit and that's it now my application can access

00:15:06.860 | ai services that's the only thing that you need to do and how this works this is going to enable your

00:15:13.980 | application to have access to environment variables that contain the api url the api key and the model id

00:15:22.300 | and with those three things you can use like an existing sdk that supports the open ai specification

00:15:29.580 | or the open ai api or you can perform like an http request or build your own solution to run these

00:15:38.780 | applications so now we provision ai into our app now our another app is deployed so we can do the same

00:15:48.460 | with that so i'm going to keep following then let's go and load the workshop so i'm going to be copying

00:15:56.700 | this url that's the jupiter notebook we are going to load on our jupiter template that we deploy and here

00:16:04.860 | you go to file open from url you paste that url click on open and voila we have our notebook ready to start

00:16:17.820 | so remember that we provision the manage inference and agent service into our application

00:16:25.100 | that is going to give us three environment variables if i go here to settings

00:16:31.340 | reveal config bars we see the environment variables there the inference key the model id

00:16:41.020 | and the inference url the api key that we will use to call the service so obviously the first step that

00:16:49.740 | we will do is to load those environment variables on our jupiter environment so we will do this workshop

00:16:57.260 | in four parts the first one is set up the environment we are going to just load those environment variables

00:17:03.980 | so we can continue doing our activities second we are performing manage inference this is your basic chat

00:17:11.500 | completions endpoint and we will see two examples one doing like basic chat completions and the other one

00:17:19.420 | doing a streaming version of those chat completions endpoint the third which is one of my favorite ones is

00:17:26.620 | the heroku tools or the agents part we have another endpoint that lets you run agents that we support

00:17:35.820 | natively we have code execution agents we have database access agents document conversion execute commands on

00:17:44.700 | a dyno and we are going to continue adding more to those and this is just an endpoint that performs the

00:17:51.580 | execution of the agent in the heroku infrastructure in our dynos and it is just an one of dyno what is

00:17:58.780 | a one of dyno it just spin ups runs the code and scales back to zero so you are not paying for an

00:18:06.140 | application that is constantly running you are just paying for the amount of compute that that tool took to

00:18:12.300 | execute and that endpoint also support mcps mcps that you can deploy to heroku

00:18:19.500 | and then attach them to your a inference agent endpoint and last but not least we are going to take a look

00:18:27.420 | at that mcp support that we have how can you deploy an mcp to heroku attach that mcp to the agent's endpoint

00:18:35.500 | and also use those mcps externally remotely through the mcp gateway or mcp toolkits that we have

00:18:44.700 | i know it sounds like a lot but it's going to be easy i have a couple of exercises for you if you want

00:18:49.820 | to write some code but basically what i would be doing here is just like running the code that i

00:18:56.140 | already have implemented here a good friend thank you for coming okay so first thing i told you we need

00:19:03.180 | to load these three environment variables inference ull inference key and inference model id and there is a

00:19:09.580 | fourth one that we will use for certain uh tool execution on heroku which is the target application

00:19:19.340 | name so what are going to put you here in target application name the name of the application you

00:19:27.580 | deploy the name of the application the hero the the heroku jupiter application that you deploy

00:19:33.500 | in this case my worship my jupiter notebook is called ai engineer workshop this is the one that i'm going

00:19:41.580 | to use here i am giving permission to my tool to run commands and perform compute operations on this

00:19:49.500 | dyno on this application so here you just replace to the name of your application so you can do those

00:19:57.100 | examples and later i will show you also how can you give access to the application that has the database

00:20:04.780 | to run the other examples that we have for you so let's load environment variables now the environment

00:20:12.220 | variables are ready so i can continue with the examples then let's go with manage inference i mentioned

00:20:20.700 | that for manage inference we have the basic chat completions endpoint this is an endpoint that you

00:20:26.860 | will find on services like open ai or cloud anthropic basically you ask a question you send a message

00:20:33.980 | or an array of messages and you are going to get a response this endpoint also supports custom tool

00:20:41.500 | execution but this tool execution is something that is happening on your code so you need to specify the

00:20:48.620 | function and capture like the information to execute those or if you are using an agent framework like adk or

00:20:56.300 | lang chain it also has like custom tool execution internally but for the example we are going to

00:21:02.860 | perform a basic inference to perform the basic inference we are we are going to do a basic http request

00:21:09.660 | so we have the inference key the model id on the payload so we are going to use cloud 4 which is the model

00:21:17.420 | that i provision here on my app and the message that i'm going to send as an user is explain the context the concept of manage

00:21:25.980 | inference inference in one sentence you can adjust the parameters that the open ai api supports like

00:21:32.300 | temperature maximum tokens top p etc you can take a look at the documentation by clicking on the path

00:21:42.540 | endpoint and it will take you to the documentation of that endpoint and a lot of the parameters we support

00:21:47.820 | and then after performing the http request we get the response as json and let's run this

00:21:56.220 | and let's see how this works so we get a response like any other ai endpoint works so we have a chat

00:22:06.620 | completions object with the different choices like the amount of messages that it returns basically we have just

00:22:12.620 | one message here as an assistant and the response is manage inference is a cloud service that handles the

00:22:19.740 | deployment scaling and maintenance of matching learning models for real-time predictions etc so that's a basic

00:22:26.380 | example so this is something that you can do everywhere getting inference it became a commodity

00:22:33.980 | and also to show that we also support the streaming of course we want to have like a better experience

00:22:39.660 | while building our applications we need to support streaming meaning that it's not going to

00:22:45.660 | return the whole answer as just one json object it's going to start giving your chunks so you can get that

00:22:52.140 | real-time feedback when getting the response from the infinite service

00:22:57.260 | so here i'm just consuming chunks and printing out those chunks on the s on the screen the first

00:23:06.380 | sample is going just to show you the raw version of this request so this is using a server send event

00:23:14.620 | approach is sending you message events every message events has data and each data is a piece of that

00:23:23.580 | that completion object but in this case you are getting like a like a delta like a version of that message

00:23:31.180 | so you see here in the content we get nothing then a streaming is crucial for this and then it continues

00:23:39.660 | giving me the response we get a bunch of different responses but let's take a look how that looks like

00:23:46.940 | on an application when you are rendering the chunks this is just the pure objects to understand how the api

00:23:53.020 | works here i have another example where i am doing some parsing of that information extracting the delta

00:24:00.380 | content of each response and then at the end rendering everything as a beautiful markdown when i execute this

00:24:07.980 | the inference service is going to start responding on real time chunk by chunk all the messages that i'm

00:24:15.580 | getting from the service as you can see it is a markdown response so at the end when we finish

00:24:21.660 | we use the markdown display object here in jupiter to get the beautiful answer here sometimes it will

00:24:31.100 | give you like a code example depending of what the inference service is giving you and that's how this

00:24:38.300 | works and how this is how you can build an application to consume these real-time streams

00:24:44.780 | but now let's talk about the good part which is the agents who has built an agent before like an

00:24:52.780 | application that executes a tool or run run some code perfect we have like a couple hands there amazing

00:25:00.540 | so the tools that we support today an anush here as product manager can tell us what we are going to be

00:25:06.060 | supporting in the future uh dino run command dino run command allows you to execute a command on a dino

00:25:15.100 | like a unix command or a script that you are do you already have pre-deployed on your application it will

00:25:21.980 | execute that and give you the response so this is pretty good to run trusted code code that you have written

00:25:29.580 | that you know that it works that it has a predictive predictive result and you can just have that

00:25:37.180 | code running on your dino and execute it through this a this agent this tool we have query databases we

00:25:44.620 | have two tools here one is postgres get a schema and the other one is postgres run query postgres get a schema

00:25:52.700 | mean the llm doesn't know the shape of your database if you ask for data it will hallucinate an sql query

00:26:00.620 | or if you have some sort of retrieval augmented generation or if you are grounding the prompt

00:26:08.220 | with the shape of your database it will generate something that it's close to the shape of the data

00:26:14.780 | of your database but with this tool it will get exactly what you have on the on the database and then

00:26:22.620 | we can pass that schema to the next tool which is run query it will generate a query that will run on

00:26:29.260 | your database get the result and make inference over those data then we have a couple other tools for

00:26:38.220 | document transformation html to markdown pdf to markdown you pass an url like a website or a url of a pdf

00:26:47.820 | that is hosted somewhere and it will give you the markdown response so the inference can work over

00:26:53.740 | that text content it's going to perform that text extraction for you and my favorites are the code

00:27:00.540 | execution one we support for today python node ruby and go so the llm will generate code that will run on

00:27:09.980 | eroku on a one of dino and it will return that response back to the inference service and these also

00:27:16.700 | support dependency installation so the input parameter has like a packages array parameter where

00:27:24.860 | the llm automatically say okay my python script is going to use pandas or numpy let's install it before

00:27:33.900 | i attempt to execute the tool and all of these are just mcps that we natively support but you can also extend

00:27:42.940 | this agent endpoint by deploying your own your own mcps so now let's take a look the agent's endpoint is

00:27:50.780 | different to the chat completions it works similarly with some minor differences for example all of the

00:27:59.180 | responses are a stream so it doesn't support like the synchronous call where it waits to give you like

00:28:05.900 | like the full response because code execution access to database and this tool execution take time so we

00:28:11.740 | prefer to have it like a stream and then depending on the tool you are using you might need to give access

00:28:18.540 | to the tool to your database or to an application for example to execute a command so let's try that first one

00:28:28.780 | let's run run a demo to run a command on an eroku dyno so here we are setting up the payload similar to chat

00:28:39.740 | completions we have the model id we have the message we are asking

00:28:46.380 | what is the current date and time on the server llms doesn't know anything about like the real time

00:28:53.180 | they need to use tools to be able to have context about what's going on right now i have asked the llm

00:28:59.420 | about like information and it gives me like information from 2024 because that's the date when they stop

00:29:04.860 | training the model but with this it knows exactly what time is because it's running this command on the server

00:29:12.780 | so the tool is an eroku tool so i'm specifying the type to run this on eroku the tool that i'm executing

00:29:20.860 | is the dino run command and then the parameters for this tool are okay the target app name remember i told

00:29:28.780 | you use the name of the application you deploy and then the command to run is date and a description of

00:29:37.340 | pretty much what's going on with the with the output of this command

00:29:43.500 | and i'm calling it i'm just calling the agents eroku endpoint i'm passing that payload that has the tool

00:29:52.620 | this is also a stream endpoint so i'm having the the stream output here and i'm just extracting the tool

00:30:00.380 | calls and this is a standard also shape like api shape that other api support for tool calling but

00:30:09.260 | this is specifically running the tools on eroku so let's execute and see the example and i'm hoping you

00:30:18.220 | can execute this tool in your in your jupiter notebook so it is running this on my ai engineer workshop

00:30:25.500 | application is going to execute the dino run command so now it is waiting until it runs the command of my

00:30:32.700 | app we get the the date and time and then the inference response cool i mean it's a basic example but

00:30:41.900 | powerful instead of running date imagine if you deploy your own screen that does your data extraction

00:30:49.180 | or connects to external services you can just call it call it from here from this endpoint as well

00:30:55.100 | the other one code execution with code execution i am asking the inference service to perform a

00:31:04.140 | operation and i am passing that code execution tool so now that it has that tool enabled it will try to

00:31:12.620 | generate code to generate code in that language run it on eroku and give you the response back

00:31:17.420 | for this example i will invite you to change node to python ruby or go to see different responses

00:31:26.700 | i'm a node developer even though i have a bunch of python here some of you might recognize this is llm generated so

00:31:34.860 | sorry not sorry that's what tools are for and i'm going to be executing that a code exec node to perform

00:31:43.340 | this operation what is the 30th fibonacci number this is a basic algorithm i just wanted to do something

00:31:49.580 | easy i will invite you to change this to perform a different operation and see what you can do

00:31:55.660 | like break this thing like break this thing and then for the execution i am just parsing the response so i

00:32:01.340 | get like beautiful markdown code highlighting and everything so let's perform the operation

00:32:09.020 | and it will generate the algorithm let's see okay i will execute code exec node this is the code that i'm

00:32:22.460 | passing as the input this is the beautiful markdown syntax highlight code that it will execute

00:32:28.780 | that's javascript yes that's javascript and at the end it's going to execute that code on eroku and then

00:32:35.580 | i am getting the response back the fibonacci number is 233 but i want to do that in go so just change

00:32:45.660 | from node from node to go execute this and it will do the same but now the code that we are going to

00:32:53.820 | see here is a is a go code and it will do the same get this go code on an eroku dino compile it run it and

00:33:01.180 | get you the response back that looks like a go code to me i am not not a go developer but that definitely

00:33:08.300 | looks like go and it executed the response and gave me the same thing and an explanation like the inference

00:33:16.140 | operation over that tool execution but that's cool so far we have just called one tool the good thing

00:33:23.500 | about the agents is that we can chain calls together mix and match we have like different agents acting

00:33:30.540 | together so now let's complicate things a little bit more and we are going to use two tools one is the

00:33:36.780 | html to markdown so go to a website do something and then get that result and use it on this other agent

00:33:45.100 | for that we are going to use two tools html to markdown and code execution python so the prompt here is

00:33:55.420 | use the python snippet from the wikipedia page for euclidean algorithm to calculate the common divisor of

00:34:04.060 | 200 252 and 105. so it doesn't know the algorithm i am entailing directly go fetch the exact one that is on

00:34:16.300 | wikipedia then run this code on eroku and give me the response i'm enabling those two tools html to markdown and

00:34:24.940 | code execution python and let's run this and now we are going to see multiple tool calling

00:34:32.300 | getting the page in markdown so now it's recognized that the euclidean algorithm page is this one

00:34:39.180 | now it is reading the whole content of that page in markdown attempting to get the algorithm from there

00:34:46.460 | then it's going to generate the code then it's going to run the code and at the end i'm going to get the

00:34:51.420 | response this execution takes a little bit more time because it is performing multiple tool runs at the

00:34:57.660 | same time

00:34:58.220 | any questions so far is anybody doing it if not you have access to the workshop you can do it

00:35:07.020 | at home with more time modify the code analyze it and make sure

00:35:13.500 | and the concept is understood so it got the algorithm here and it executes the thing on eroku now i'm just

00:35:23.100 | waiting for the last inference step and that execution is done there you go i found the python implementation

00:35:31.900 | from wikipedia i calculate that number that you asked me this is the python snippet from wikipedia

00:35:38.860 | and then it implemented in this code and this is how i did it these are the explanation beautiful it works

00:35:47.420 | now before i get into postgres execution we need to have access to the database and right now the

00:35:57.340 | application that you have doesn't have access to that database if you want to give it a try

00:36:03.260 | you are going to go to the heroku dashboard where is the heroku dashboard here

00:36:11.820 | to the ai engineer world's fair

00:36:19.500 | click on ai engineer data

00:36:21.980 | go to resources

00:36:27.660 | and we have one thing here the postgres database tools only work on followers followers on eroku are read

00:36:39.820 | only for security reasons we don't want to give an llm tool right access to your database because of course

00:36:48.940 | uh llm's are not trusted for that if you want to give right access to your production database to an

00:36:56.140 | agent i will invite you to deploy an mcp that does that like that postgresql mcp give the access under your

00:37:03.900 | own risk the ones we maintain we want to make sure we don't like break production so we have two different

00:37:11.340 | databases here the master one or the main one is database don't touch that one the other one that

00:37:18.620 | has attachments is the follower so you are going to expand these manage attachments and add your application

00:37:29.980 | like julian jupiter and that's it now i gave my application access to the database

00:37:40.140 | second you will need to get the name of that database so go to your application

00:37:48.860 | in this case is julian jupiter

00:37:51.020 | go to resources

00:37:55.020 | and this database here

00:37:58.940 | ai engineer data

00:38:06.060 | in my application is called database so that's the name i'm going to use

00:38:12.860 | in code since i am working on ai ai engineer workshop the name is different so i'm going to show you so

00:38:23.980 | you can recognize the differences in names this is just the name of the environment variable on heroku

00:38:30.300 | that has the connection string so here i have access to that database on this application my database is

00:38:39.180 | called heroku postgreSQL aqua so that's the name you are going to get and change it in your code

00:38:46.300 | heroku postgreSQL aqua

00:38:49.820 | so now i have access to that specific database on my app i cannot access databases from other

00:38:57.500 | applications this is for security reasons so this is why i have to do the attachment first

00:39:02.220 | to be able to give permissions to my database if the database leaves in your application

00:39:08.300 | you don't need to do to do this because your application already has access to it that's

00:39:13.100 | the only thing that we need to do here to to set up then we are going to enable two tools

00:39:20.860 | postgres get a schema and postgres run query these are going to run on my application remember

00:39:28.620 | the target app name and these are going to access that specific database so i'm giving permission to go to

00:39:37.020 | those two places so the database that we have here is a database for a solar energy company it contains

00:39:45.660 | a table full of metrics with energy consumed and energy produced every hour in kilowatts per hour

00:39:54.700 | and we have metrics for maybe two months so now what type of applications can we build with this

00:40:02.940 | like asking questions like how much energy has been saved in the last 30 days it's going to

00:40:09.740 | understand the shape of the database it will see that it has a metrics table with these columns

00:40:16.220 | then it will go and generate an sql query to be able to get this information and then run that query and

00:40:24.700 | give you the response and give you the response and let's execute that to see how it works

00:40:33.820 | so first step is i don't understand your database let me get the schema

00:40:39.340 | so to get the schema it executes the tool

00:40:44.460 | and then you are going to see the full schema being printed here on the screen in a moment so now we have

00:40:50.940 | the schema of my database it see the different tables metrics with the fields products systems users etc so i

00:41:02.780 | have the information that i need here beautiful now it will run the the query so it already generated the

00:41:11.260 | query that it is a hundred percent sure will run on my database because it knows the shape

00:41:17.740 | this is the response it got the data and now that after it got the data it's going to perform the

00:41:24.460 | inference it's going to extract the answer from that data and at the end it's going to tell me the information

00:41:31.020 | it wanted to run more queries to get a breakdown of the energy by system i have three different systems

00:41:39.260 | one system that performs well the other one that doesn't perform pretty good the other one that it's

00:41:44.940 | totally horrible so it is also trying to get more information just from one one question it is doing

00:41:51.340 | inference on my data and i just enabled two tools i didn't need to do anything else and there we go

00:41:59.580 | the report these are the key metrics this is how much you have saved and this is the breakdown per

00:42:06.780 | system the best performer good performance and the energy deficit and this is getting access to your data

00:42:14.300 | and acting over your data we can add a third tool let's say code execution python i want to generate a

00:42:21.820 | graphic so it will now do a matplotlib graphic with that data and you can keep adding more more things

00:42:31.180 | but also the more agents and tools that you add it will take more time to perform the whole operation

00:42:37.820 | but that's an example how can you mix and match these tools that we have here and how can you give access to

00:42:45.340 | agents to your data i have a couple of exercises just try to come up with an example to use a pdf and extract

00:42:54.380 | something from that pdf and run code and you can do this on your own time we are running low on code

00:43:00.540 | i want to show you now how can we deploy and run mcps with this same endpoint so those are the tools we

00:43:11.020 | maintain but what about the tools you are building and the tools that already exist on the mcp ecosystem

00:43:19.900 | so with manage inference and agents uh let's say you go to the dashboard to the configuration page

00:43:28.300 | and here you get access to the model configuration and here it has the toolkit integration and the mcp server list

00:43:41.500 | so basically this mcp brave that i have here the brave search mcp and you might have already have

00:43:49.500 | access to this so you can attach it to your own jupyter notebook pretty much to attach an mcp you

00:43:55.740 | click on manage mcp servers you attach it as an application similar to what we did with the database

00:44:02.780 | it's just another local application and there you go we have an mcp server here that is exposing to tools

00:44:11.500 | web search and local search so now we can use these tools on the agents endpoint so let me go

00:44:19.420 | and take a look at that so these are the instructions step by step how can you enable an mcp

00:44:24.780 | then it's just another tool this time it is not a heroku tool it is just an mcp and this is the name of

00:44:35.100 | the tool that i will execute mcp brave

00:44:39.100 | is the name of the name space you can have multiple mcps here so this is kind of like a like a namespace for

00:44:51.100 | your mcps and i am going to execute brave web search this requires an api key and i already have the api key

00:45:02.140 | on my application if you want you can go ahead and take it i'm going to like remove that api key in a moment

00:45:07.820 | don't perform like a 2000 queries otherwise i'm going to get charged but now i have access to it and the

00:45:16.780 | security to also enable those those keys to your mcps and i will execute it as another tool

00:45:24.220 | and the prompt that i'm sending is what is the most recent news about the ai agents let's execute this

00:45:29.980 | tool

00:45:30.220 | and this mcp is running on eroku the same thing as code execution or the postgres tools it will spin off a

00:45:41.660 | dyno run the mcp as a standard input and output and scale back to zero so you are not paying for

00:45:49.100 | something that is constantly running and there we go that's the search from the brave web search

00:45:56.860 | so this is the response from the tool and now it's going to render that after the inference operation

00:46:06.940 | so i'm running mcps on eroku a quick example how can you deploy an mcp

00:46:14.540 | i have an exercise here deploy an mcp to eroku i have a perplexity ask mcp that i just forked

00:46:24.620 | this is the official perplexity mcp i fork it to my repository to just do one thing to make it eroku

00:46:33.660 | compatible you use the proc file which is the file on eroku that defines how an application is executed

00:46:42.940 | and i added the entry point as an mcp so this is a new process type we support everything that

00:46:49.420 | starts with mcp is going to be recognized as an mcp on eroku it will execute this standard input and

00:46:56.940 | output code it also requires an api key that i don't have but you can deploy it to deploy to eroku i added a

00:47:05.980 | button to make it super easy click to deploy i deploy this mcp to my space and then i attach it to my

00:47:14.060 | application this is my perplex city mcp let's add it available to the space

00:47:27.020 | i don't have one i don't have one i deploy there are certain mcps if you don't have an api key it

00:47:33.900 | fails when you run them so this is why i'm specifying something it doesn't need to be valid

00:47:39.100 | and this is now a node.js application it will just deploy this build this and keep it available under

00:47:45.500 | eroku and you will be able to attach it to your to your app so this is an exercise you can do and

00:47:50.940 | write the code to run that that mcp that you just deployed

00:47:55.580 | now we got the response from the previous brave search

00:48:03.340 | and this is pretty much the most recent news about ai agents but remember that i told you

00:48:10.060 | that you can also use those mcps outside of eroku not only for eroku agents let's say you are using

00:48:18.300 | cursor or cloud desktop or you are writing your own agent in a different platform but you want to have

00:48:25.340 | those mcps available remotely you can use those two so here on the management dashboard you see the

00:48:33.740 | toolkit integration page so all of the mcps that i deploy to my app are going to be available through this

00:48:41.340 | endpoint so this is a server sent event endpoint and it is authenticated behind a better token and

00:48:49.820 | we are working on all our support so that you can run it securely you can build remote mcp servers

00:48:57.340 | on heroku that are accessible without a better tokens we don't like better tokens

00:49:01.820 | perfect so now that the mcp is there let's add like my my perplexity really quick let's refresh

00:49:09.740 | i need it perfect i have it it's available i go to my toolkit copy the token

00:49:19.340 | it is the same the same token that you use for inference so i don't need to do this

00:49:23.740 | you just need the url and in my jupyter code i have a basic mcp client i'm using

00:49:31.020 | so i need to install the dependency here i'm using the mcp package from uh from anthropic

00:49:42.380 | and i'm creating an mcp client that mcp client is going to my heroku endpoint passing that api key as a

00:49:51.580 | authorization header and let's run it to list the tools that are available and then execute

00:50:00.060 | brave web search so i'm executing the mcp that i deploy in heroku outside of heroku

00:50:08.380 | let's run this demo

00:50:09.900 | and it connected we have the following tools brave local search brave web search and perplexity ask

00:50:21.340 | the one that we just deployed and enabled with just two clicks

00:50:24.540 | mostly i need executed the web search i am not processing the result here this is just a very basic

00:50:32.460 | example but that's how can you deploy on mcp and use it externally and i think and i i have also like

00:50:39.420 | a course or uh mcp gateway that i use for mcps that i deploy i just use the same approach to use

00:50:47.500 | my mcps on cursor and last but not least i told you that the at least the chat completion standpoint is

00:50:57.020 | compatible with open ai api 95 we are working to bring it to 99 that like certain parameters that are

00:51:04.860 | not supported but you can just use the sdk so let's use the sdk to perform a basic operation i am just

00:51:13.900 | using the open ai sdk with the api key and url from eroku i perform an inference

00:51:22.060 | and we should get a response

00:51:27.500 | in a moment

00:51:33.900 | and there you go we have the response and that's pretty much what we had for you today you get

00:51:41.900 | access to this notebook keep playing with it i extended access to to the heroku platform that team

00:51:48.860 | until the weekend because right now unfortunately we don't have a free tier we are working hard to bring

00:51:54.780 | it back but you can go deploy try it out and if you have any questions please

00:52:03.260 | connect with us on social we have the dev center the site for documentation the heroku ai website

00:52:14.540 | we created a heroku community on twitter or x these slides are on the slack so you can get them from

00:52:22.460 | you can get them from the slack and thank you very much i hope you enjoyed this worship

Building Agentic Applications w/ Heroku Managed Inference and Agents — Julián Duque & Anush Dsouza

Chapters