back to index

Building Agentic Applications w/ Heroku Managed Inference and Agents — Julián Duque & Anush Dsouza


Chapters

0:0 Introduction to Heroku AI
4:24 Core Mission: The product's goal is to make every software engineer an AI engineer. Anush Dsouza, the Product Manager, states Heroku wants to make it “simple to attach agents and AI to your application.”
5:1 Agentic Control Loop: Heroku provides an "agentic control loop" running on its platform. This loop gives AI models access to tools like code execution and data access, all secured under Heroku's trust layer.
6:25 AI Primitives: Heroku AI is built on key primitives. These include inference for accessing curated models, the Model Context Protocol (MCP) for extending app functionality, and PG Vector for handling embeddings.
7:8 Trusted Compute: Heroku's trusted compute layer, Dynos, runs first-party tools. They plan to expand this with tools for web search and memory, and users can bring their own tools via MCP.
13:23 Managed Inference: This service allows you to run AI models directly within your Heroku infrastructure. This keeps your data within your application's network for enhanced security.
14:38 Supported Models: The platform supports text-to-text models from Anthropic (Claude 3.5, 3.7, and 4), embeddings from Cohere Embed, and image generation with Stable Image Ultra.
50:51 Chat Completions API: The basic chat completions endpoint is designed to be highly compatible with the OpenAI and Anthropic APIs. The presenter notes it's “95% compatible with the OpenAI API,” allowing the use of the OpenAI SDK..It supports standard parameters like temperature and max_tokens, as well as streaming responses. [21:29]

Whisper Transcript | Transcript Only Page

00:00:00.000 | welcome welcome you all to this launch and learn I hope you are enjoying lunch
00:00:19.440 | right now that the conference brought you we are going to be talking about our recent product that
00:00:27.840 | we announced at Heroku which is the Heroku Manage Inference and Agents or part of the Heroku AI
00:00:34.080 | offering and we will be talking about the fundamentals of building authentic applications
00:00:39.360 | with this service this is not a session this is a workshop so if you want to follow along
00:00:45.720 | just bring your laptop you only need a browser you are not going to install anything in your computer
00:00:51.420 | I swear that is going to be easy to follow along also this content is going to be available for you if
00:00:58.860 | you want to continue learning at home and having access to the Heroku platform during the event and
00:01:06.540 | the weekend so initially we are going to do the setup while you get ready by signing up to an Heroku
00:01:16.780 | account or if you already have an Heroku account you can just get access into the platform I will
00:01:22.260 | give you a link for you to follow then my friend here is going to give you an overview of what we
00:01:28.620 | release and then we will go and do the hands-on my name is Julian Duque I'm a principal developer advocate
00:01:34.860 | for Heroku I'm here I am with Anoush yeah hi my name is Anoush I work as a product manager for Heroku AI
00:01:40.700 | thank you everyone for joining us very excited to talk to you guys as well as walk you through how simple Heroku is
00:01:48.140 | of course and let's get the started you can join the workshop Heroku AI channel in the AI engineer slack
00:01:57.580 | workspace there I share the slides and some links also this is the QR code that is going to take you
00:02:05.740 | through the workshop side so in the workshop side you are going to find a link that has the content
00:02:13.500 | and a form when you enter your Heroku account email if you already have an Heroku account just enter your
00:02:21.340 | email that you use to sign up to Heroku if you don't have an Heroku account go ahead sign up for one
00:02:27.580 | you don't need to put like any credit card information and with that form you are going to get access to
00:02:35.020 | Heroku you will be able to deploy applications use our services during the duration of this week
00:02:42.700 | if you have any problems with the sign up process let us know we are here to help and this is a follow
00:02:48.780 | along workshop what are we going to do is deploy a Jupyter notebook to Heroku and then we are going to
00:02:56.700 | load all of the workshop contents into this Jupyter notebook and from there I will take you with the
00:03:03.820 | workshop material so let's take a picture of this also make sure you join the slack channel there is the
00:03:09.900 | information in there and while you are doing the setup my friend Anush here is going to give you an
00:03:17.180 | overview of Heroku AI and what we release okay thank you for joining us friends how many of you are familiar
00:03:26.940 | with Heroku have you used Heroku before show of hands oh nice so to set the stage this is the most exciting time to
00:03:37.900 | be building right now and especially building with AI and this point has come before right there have been
00:03:45.180 | inflection points of technology that have fundamentally changed the way we build things there was the
00:03:49.980 | internet there was the cloud there were web apps and now there's AI previously when Heroku started
00:03:56.060 | we took a similar challenge people wanted to build and deploy their apps especially with Ruby on rails but
00:04:02.060 | it was hard it was hard to operate it it was hard to deploy it it was hard to scale it Heroku made that
00:04:07.500 | super simple with get push Heroku main and a whole new host of developers could easily build push and scale
00:04:14.060 | their apps we are doing a similar thing right now where we have taken on the challenge with AI we have seen
00:04:19.980 | that people want to build with AI they want to build agentic applications they want to build agents that scale and
00:04:25.340 | operate in a way that is very simple we want to make sure that every software engineer right now is an AI engineer
00:04:32.620 | and it is simple as attaching agents and AI to your apps so how are you doing that currently a lot of
00:04:40.860 | solutions are only for the day one problems but what happens on day two how do you operate it how do you
00:04:46.700 | scale it there are so many models out there how do you know this is the right model for your problem how
00:04:52.940 | do you know your tools are running safely Heroku has taken a very opinionated and curated sort of
00:04:58.060 | models that we believe work best for our customers and developers will enjoy we have expanded that
00:05:05.020 | folder by deeply entrenching these models in an agentic control loop that runs on Heroku that has access
00:05:11.100 | to tools like code execution access to your data all under the trust layer of Heroku and for these extension
00:05:18.060 | of agents we are using the model context protocol you might have seen online that people keep asking
00:05:24.620 | that who's going to build the Heroku of AI or who this is the Heroku of x so the question for who's
00:05:31.660 | building the Heroku of AI it's Heroku of course why wouldn't we okay so what are the challenges right
00:05:39.260 | now people are facing one of the things that i can see is that how do you how do you figure out that
00:05:45.260 | this model works best for you how do you know that it's evaluated and traced and has the right
00:05:50.860 | technologies to make sure that it is performing the way you want it to perform so these are the
00:05:56.060 | challenges that we're taking on we are curating these things such that whenever you work with the agentic
00:06:00.300 | applications you don't feel like it's a bunch of knobs and bells and whistles like a plane but it's
00:06:05.820 | actually pretty simple by taking on the opinionated approach of having those defaults for you you can get
00:06:11.820 | started with as simple as one cli command you just do Heroku AI models create and that attaches AI to
00:06:19.020 | your app as a resource moving on to the next slide of course okay so we have three major things that
00:06:26.700 | work great for our AI application or building agentic applications one is we offer primitives like inference
00:06:33.740 | where you can curate and take a set of models and access this in your apps we also have model context
00:06:39.740 | protocol to extend your apps you can build remote mcp servers on Heroku simplified way and you can also
00:06:45.740 | build standard i/o mcp servers that run in Heroku's trusted compute and they can also scale to zero so
00:06:51.820 | that you do not pay for things that you're not using and we also have had pg vector which is a great vector
00:06:58.380 | database for embeddings so all these together give you all the primitives that you need
00:07:04.060 | to build agentic applications so how do we do this Heroku has a trusted compute layer called dynos
00:07:14.380 | and these computers were the one that runs tools for you so example we provide first party tools like code
00:07:23.740 | execution that run on Heroku's compute and these can run on the compute and stream the data back to you and
00:07:30.060 | solve your problems we plan to offer our additional first party tools such as web source for grounding
00:07:36.940 | memory memory is really important right now so in the future we'll probably offer memory
00:07:42.140 | and other compute tools that can run on Heroku's compute and provide to you you can bring your own
00:07:46.860 | tools as well using mcp that can run on our compute and stream things back to agents okay with that i'll
00:07:54.860 | hand over to julian for a more hands-on workshop and you can build all the things i spoke about within the
00:08:00.940 | next 30 minutes okay let's go back to this last uh picture for the folks that just joined so you get
00:08:07.820 | access to the workshop content and remember we have the slack channel in the ai.engineer workspace
00:08:15.260 | where i already shared these slides and the link so you can follow along you can follow along with me
00:08:20.460 | right now or you can follow along at home with more time so what are we going to do is the following so
00:08:27.980 | let me show you that link is going to take you to this page in this page you have access to the
00:08:34.860 | workshop resources this is a website that has the instructions step by step of what you are going to
00:08:40.620 | be doing today i'm going to be following these same steps right here right now and second we have a way for
00:08:48.540 | you to get access to eroku if you already have an eroku account just put your email the email that you
00:08:55.340 | use to log into eroku if you don't have one go and sign up first you can sign up for eroku you don't need
00:09:02.540 | to enter your credit card information with just this invitation you are going to get access to the workshop
00:09:08.540 | these services are going to be enabled until uh the 7th like this weekend and i can extend it if you
00:09:15.980 | like the content we have for you this is going to send you an invitation to a team the the team is
00:09:22.460 | called ai engineer world's fair go accept that invitation and then you have access to this dashboard
00:09:30.860 | so from the heroku dashboard you go to dashboard.heroku.com you log in and you should get access to this view
00:09:38.460 | we already have like some pre-deployed applications here the workshop that i will be showing you the
00:09:45.900 | jupyter notebook the brave search mcp we are going to use a pre-existing mcp
00:09:51.900 | to give you an example of how to call that mcp from the manage inference and agents endpoint and i also
00:09:59.500 | have another application here which is the ai engineer data that has an eroku postgreSQL database
00:10:05.020 | and we are going to see how can we build agents that has access to this database as well but we start the workshop
00:10:14.140 | first by deploying an eroku application so we have this eroku jupyter template
00:10:22.300 | that you can deploy to eroku go to the repository after you have the account and logged into eroku
00:10:32.620 | just click to deploy click to deploy you are getting this page that will ask you for an application
00:10:42.620 | name make sure that this application name is unique and the internet situation is a little bit slow let's
00:10:50.780 | hope it gets better to show you the things working live otherwise i i need to start getting access somewhere
00:11:02.140 | if you join this please go to the slack channel it's called workshop eroku eroku ai workshop and in the ai
00:11:10.860 | engineer slack channel you can find the how to get started instructions if you guys are stuck somewhere
00:11:16.540 | or you don't know how to get started please raise your hands and i can come help you out beautiful this is
00:11:21.580 | the deployment page we are going to create a new application remember to have a like a unique name
00:11:26.700 | let's do this is jdukey jupiter this is going to be my jupiter it is not unique so if it is not unique
00:11:34.060 | you are going to get an error so let's do workshop to get a unique and for the app owner if you want this workshop
00:11:43.740 | to work without you paying anything make sure to select the ai engineer warfare which is the team we
00:11:52.620 | have invited you to these jupiter notebooks are password protected so there is an environment variable
00:12:01.260 | here for you to define the password to this workshop i'm going to use a super secure password lab
00:12:07.820 | and then click to deploy so this is going to deploy the application with a dino where the jupiter notebook
00:12:16.460 | is going to run the dino is pretty much the container unit where your application runs like the virtual
00:12:23.100 | machine on heroku and it also has heroku postgres so all the work that you are going to do in the
00:12:29.260 | jupiter notebook is going to get persisted into this database and this is going to take a little bit of
00:12:35.980 | time to to install this is pretty much fetching all the source code of this jupiter notebook
00:12:41.900 | template to heroku building that application and all of the dependencies and then you have these
00:12:48.540 | available for you to use so basically you are going to get something like this i already have this deployed
00:12:56.220 | but you don't have any workshop here so we are going to be loading that notebook
00:13:03.740 | and to load that notebook you go and keep following the instructions oh the third step that i missed is
00:13:11.020 | how can we provision managed inference may mainly how can we produce provision an ai model to an eroku
00:13:21.180 | application so manage inference is a service that let you run ai models within the same infrastructure where
00:13:29.580 | where your application is running so the data is not going to third parties if you are using let's say like
00:13:34.940 | open ai or anthropic apis the data is going outside your application this is keeping
00:13:42.540 | uh everything running inside the same network that your application is running
00:13:46.380 | and to provision these you can use the heroku cli but we are going to make things easy
00:13:52.060 | once the application is deployed this is taking time to deploy so let's go to something that it's already
00:13:59.500 | deployed let's say this one that i i already created before you go to your application in the dashboard you
00:14:07.500 | click to the management page then you go to resources
00:14:14.060 | and in resources is where you can provision add-ons add-ons like heroku postgres or the key value store
00:14:20.860 | or any other third party item that we support on our marketplace the one that we are going to provision
00:14:26.700 | is called heroku manage inference and agents this is our ai offering so you are going to click
00:14:33.020 | on this add-on and then you will select the model that you want to use in your application we support
00:14:41.740 | text-to-text models from entropic cloud 3.5 3.7 and 4 we support coherent bed for embeddings and we
00:14:50.140 | support a stable image ultra for image generation we are going to be working mostly with text-to-text
00:14:57.020 | with inference models so let's do cloud 4. we submit and that's it now my application can access
00:15:06.860 | ai services that's the only thing that you need to do and how this works this is going to enable your
00:15:13.980 | application to have access to environment variables that contain the api url the api key and the model id
00:15:22.300 | and with those three things you can use like an existing sdk that supports the open ai specification
00:15:29.580 | or the open ai api or you can perform like an http request or build your own solution to run these
00:15:38.780 | applications so now we provision ai into our app now our another app is deployed so we can do the same
00:15:48.460 | with that so i'm going to keep following then let's go and load the workshop so i'm going to be copying
00:15:56.700 | this url that's the jupiter notebook we are going to load on our jupiter template that we deploy and here
00:16:04.860 | you go to file open from url you paste that url click on open and voila we have our notebook ready to start
00:16:17.820 | so remember that we provision the manage inference and agent service into our application
00:16:25.100 | that is going to give us three environment variables if i go here to settings
00:16:31.340 | reveal config bars we see the environment variables there the inference key the model id
00:16:41.020 | and the inference url the api key that we will use to call the service so obviously the first step that
00:16:49.740 | we will do is to load those environment variables on our jupiter environment so we will do this workshop
00:16:57.260 | in four parts the first one is set up the environment we are going to just load those environment variables
00:17:03.980 | so we can continue doing our activities second we are performing manage inference this is your basic chat
00:17:11.500 | completions endpoint and we will see two examples one doing like basic chat completions and the other one
00:17:19.420 | doing a streaming version of those chat completions endpoint the third which is one of my favorite ones is
00:17:26.620 | the heroku tools or the agents part we have another endpoint that lets you run agents that we support
00:17:35.820 | natively we have code execution agents we have database access agents document conversion execute commands on
00:17:44.700 | a dyno and we are going to continue adding more to those and this is just an endpoint that performs the
00:17:51.580 | execution of the agent in the heroku infrastructure in our dynos and it is just an one of dyno what is
00:17:58.780 | a one of dyno it just spin ups runs the code and scales back to zero so you are not paying for an
00:18:06.140 | application that is constantly running you are just paying for the amount of compute that that tool took to
00:18:12.300 | execute and that endpoint also support mcps mcps that you can deploy to heroku
00:18:19.500 | and then attach them to your a inference agent endpoint and last but not least we are going to take a look
00:18:27.420 | at that mcp support that we have how can you deploy an mcp to heroku attach that mcp to the agent's endpoint
00:18:35.500 | and also use those mcps externally remotely through the mcp gateway or mcp toolkits that we have
00:18:44.700 | i know it sounds like a lot but it's going to be easy i have a couple of exercises for you if you want
00:18:49.820 | to write some code but basically what i would be doing here is just like running the code that i
00:18:56.140 | already have implemented here a good friend thank you for coming okay so first thing i told you we need
00:19:03.180 | to load these three environment variables inference ull inference key and inference model id and there is a
00:19:09.580 | fourth one that we will use for certain uh tool execution on heroku which is the target application
00:19:19.340 | name so what are going to put you here in target application name the name of the application you
00:19:27.580 | deploy the name of the application the hero the the heroku jupiter application that you deploy
00:19:33.500 | in this case my worship my jupiter notebook is called ai engineer workshop this is the one that i'm going
00:19:41.580 | to use here i am giving permission to my tool to run commands and perform compute operations on this
00:19:49.500 | dyno on this application so here you just replace to the name of your application so you can do those
00:19:57.100 | examples and later i will show you also how can you give access to the application that has the database
00:20:04.780 | to run the other examples that we have for you so let's load environment variables now the environment
00:20:12.220 | variables are ready so i can continue with the examples then let's go with manage inference i mentioned
00:20:20.700 | that for manage inference we have the basic chat completions endpoint this is an endpoint that you
00:20:26.860 | will find on services like open ai or cloud anthropic basically you ask a question you send a message
00:20:33.980 | or an array of messages and you are going to get a response this endpoint also supports custom tool
00:20:41.500 | execution but this tool execution is something that is happening on your code so you need to specify the
00:20:48.620 | function and capture like the information to execute those or if you are using an agent framework like adk or
00:20:56.300 | lang chain it also has like custom tool execution internally but for the example we are going to
00:21:02.860 | perform a basic inference to perform the basic inference we are we are going to do a basic http request
00:21:09.660 | so we have the inference key the model id on the payload so we are going to use cloud 4 which is the model
00:21:17.420 | that i provision here on my app and the message that i'm going to send as an user is explain the context the concept of manage
00:21:25.980 | inference inference in one sentence you can adjust the parameters that the open ai api supports like
00:21:32.300 | temperature maximum tokens top p etc you can take a look at the documentation by clicking on the path
00:21:42.540 | endpoint and it will take you to the documentation of that endpoint and a lot of the parameters we support
00:21:47.820 | and then after performing the http request we get the response as json and let's run this
00:21:56.220 | and let's see how this works so we get a response like any other ai endpoint works so we have a chat
00:22:06.620 | completions object with the different choices like the amount of messages that it returns basically we have just
00:22:12.620 | one message here as an assistant and the response is manage inference is a cloud service that handles the
00:22:19.740 | deployment scaling and maintenance of matching learning models for real-time predictions etc so that's a basic
00:22:26.380 | example so this is something that you can do everywhere getting inference it became a commodity
00:22:33.980 | and also to show that we also support the streaming of course we want to have like a better experience
00:22:39.660 | while building our applications we need to support streaming meaning that it's not going to
00:22:45.660 | return the whole answer as just one json object it's going to start giving your chunks so you can get that
00:22:52.140 | real-time feedback when getting the response from the infinite service
00:22:57.260 | so here i'm just consuming chunks and printing out those chunks on the s on the screen the first
00:23:06.380 | sample is going just to show you the raw version of this request so this is using a server send event
00:23:14.620 | approach is sending you message events every message events has data and each data is a piece of that
00:23:23.580 | that completion object but in this case you are getting like a like a delta like a version of that message
00:23:31.180 | so you see here in the content we get nothing then a streaming is crucial for this and then it continues
00:23:39.660 | giving me the response we get a bunch of different responses but let's take a look how that looks like
00:23:46.940 | on an application when you are rendering the chunks this is just the pure objects to understand how the api
00:23:53.020 | works here i have another example where i am doing some parsing of that information extracting the delta
00:24:00.380 | content of each response and then at the end rendering everything as a beautiful markdown when i execute this
00:24:07.980 | the inference service is going to start responding on real time chunk by chunk all the messages that i'm
00:24:15.580 | getting from the service as you can see it is a markdown response so at the end when we finish
00:24:21.660 | we use the markdown display object here in jupiter to get the beautiful answer here sometimes it will
00:24:31.100 | give you like a code example depending of what the inference service is giving you and that's how this
00:24:38.300 | works and how this is how you can build an application to consume these real-time streams
00:24:44.780 | but now let's talk about the good part which is the agents who has built an agent before like an
00:24:52.780 | application that executes a tool or run run some code perfect we have like a couple hands there amazing
00:25:00.540 | so the tools that we support today an anush here as product manager can tell us what we are going to be
00:25:06.060 | supporting in the future uh dino run command dino run command allows you to execute a command on a dino
00:25:15.100 | like a unix command or a script that you are do you already have pre-deployed on your application it will
00:25:21.980 | execute that and give you the response so this is pretty good to run trusted code code that you have written
00:25:29.580 | that you know that it works that it has a predictive predictive result and you can just have that
00:25:37.180 | code running on your dino and execute it through this a this agent this tool we have query databases we
00:25:44.620 | have two tools here one is postgres get a schema and the other one is postgres run query postgres get a schema
00:25:52.700 | mean the llm doesn't know the shape of your database if you ask for data it will hallucinate an sql query
00:26:00.620 | or if you have some sort of retrieval augmented generation or if you are grounding the prompt
00:26:08.220 | with the shape of your database it will generate something that it's close to the shape of the data
00:26:14.780 | of your database but with this tool it will get exactly what you have on the on the database and then
00:26:22.620 | we can pass that schema to the next tool which is run query it will generate a query that will run on
00:26:29.260 | your database get the result and make inference over those data then we have a couple other tools for
00:26:38.220 | document transformation html to markdown pdf to markdown you pass an url like a website or a url of a pdf
00:26:47.820 | that is hosted somewhere and it will give you the markdown response so the inference can work over
00:26:53.740 | that text content it's going to perform that text extraction for you and my favorites are the code
00:27:00.540 | execution one we support for today python node ruby and go so the llm will generate code that will run on
00:27:09.980 | eroku on a one of dino and it will return that response back to the inference service and these also
00:27:16.700 | support dependency installation so the input parameter has like a packages array parameter where
00:27:24.860 | the llm automatically say okay my python script is going to use pandas or numpy let's install it before
00:27:33.900 | i attempt to execute the tool and all of these are just mcps that we natively support but you can also extend
00:27:42.940 | this agent endpoint by deploying your own your own mcps so now let's take a look the agent's endpoint is
00:27:50.780 | different to the chat completions it works similarly with some minor differences for example all of the
00:27:59.180 | responses are a stream so it doesn't support like the synchronous call where it waits to give you like
00:28:05.900 | like the full response because code execution access to database and this tool execution take time so we
00:28:11.740 | prefer to have it like a stream and then depending on the tool you are using you might need to give access
00:28:18.540 | to the tool to your database or to an application for example to execute a command so let's try that first one
00:28:28.780 | let's run run a demo to run a command on an eroku dyno so here we are setting up the payload similar to chat
00:28:39.740 | completions we have the model id we have the message we are asking
00:28:46.380 | what is the current date and time on the server llms doesn't know anything about like the real time
00:28:53.180 | they need to use tools to be able to have context about what's going on right now i have asked the llm
00:28:59.420 | about like information and it gives me like information from 2024 because that's the date when they stop
00:29:04.860 | training the model but with this it knows exactly what time is because it's running this command on the server
00:29:12.780 | so the tool is an eroku tool so i'm specifying the type to run this on eroku the tool that i'm executing
00:29:20.860 | is the dino run command and then the parameters for this tool are okay the target app name remember i told
00:29:28.780 | you use the name of the application you deploy and then the command to run is date and a description of
00:29:37.340 | pretty much what's going on with the with the output of this command
00:29:43.500 | and i'm calling it i'm just calling the agents eroku endpoint i'm passing that payload that has the tool
00:29:52.620 | this is also a stream endpoint so i'm having the the stream output here and i'm just extracting the tool
00:30:00.380 | calls and this is a standard also shape like api shape that other api support for tool calling but
00:30:09.260 | this is specifically running the tools on eroku so let's execute and see the example and i'm hoping you
00:30:18.220 | can execute this tool in your in your jupiter notebook so it is running this on my ai engineer workshop
00:30:25.500 | application is going to execute the dino run command so now it is waiting until it runs the command of my
00:30:32.700 | app we get the the date and time and then the inference response cool i mean it's a basic example but
00:30:41.900 | powerful instead of running date imagine if you deploy your own screen that does your data extraction
00:30:49.180 | or connects to external services you can just call it call it from here from this endpoint as well
00:30:55.100 | the other one code execution with code execution i am asking the inference service to perform a
00:31:04.140 | operation and i am passing that code execution tool so now that it has that tool enabled it will try to
00:31:12.620 | generate code to generate code in that language run it on eroku and give you the response back
00:31:17.420 | for this example i will invite you to change node to python ruby or go to see different responses
00:31:26.700 | i'm a node developer even though i have a bunch of python here some of you might recognize this is llm generated so
00:31:34.860 | sorry not sorry that's what tools are for and i'm going to be executing that a code exec node to perform
00:31:43.340 | this operation what is the 30th fibonacci number this is a basic algorithm i just wanted to do something
00:31:49.580 | easy i will invite you to change this to perform a different operation and see what you can do
00:31:55.660 | like break this thing like break this thing and then for the execution i am just parsing the response so i
00:32:01.340 | get like beautiful markdown code highlighting and everything so let's perform the operation
00:32:09.020 | and it will generate the algorithm let's see okay i will execute code exec node this is the code that i'm
00:32:22.460 | passing as the input this is the beautiful markdown syntax highlight code that it will execute
00:32:28.780 | that's javascript yes that's javascript and at the end it's going to execute that code on eroku and then
00:32:35.580 | i am getting the response back the fibonacci number is 233 but i want to do that in go so just change
00:32:45.660 | from node from node to go execute this and it will do the same but now the code that we are going to
00:32:53.820 | see here is a is a go code and it will do the same get this go code on an eroku dino compile it run it and
00:33:01.180 | get you the response back that looks like a go code to me i am not not a go developer but that definitely
00:33:08.300 | looks like go and it executed the response and gave me the same thing and an explanation like the inference
00:33:16.140 | operation over that tool execution but that's cool so far we have just called one tool the good thing
00:33:23.500 | about the agents is that we can chain calls together mix and match we have like different agents acting
00:33:30.540 | together so now let's complicate things a little bit more and we are going to use two tools one is the
00:33:36.780 | html to markdown so go to a website do something and then get that result and use it on this other agent
00:33:45.100 | for that we are going to use two tools html to markdown and code execution python so the prompt here is
00:33:55.420 | use the python snippet from the wikipedia page for euclidean algorithm to calculate the common divisor of
00:34:04.060 | 200 252 and 105. so it doesn't know the algorithm i am entailing directly go fetch the exact one that is on
00:34:16.300 | wikipedia then run this code on eroku and give me the response i'm enabling those two tools html to markdown and
00:34:24.940 | code execution python and let's run this and now we are going to see multiple tool calling
00:34:32.300 | getting the page in markdown so now it's recognized that the euclidean algorithm page is this one
00:34:39.180 | now it is reading the whole content of that page in markdown attempting to get the algorithm from there
00:34:46.460 | then it's going to generate the code then it's going to run the code and at the end i'm going to get the
00:34:51.420 | response this execution takes a little bit more time because it is performing multiple tool runs at the
00:34:57.660 | same time
00:34:58.220 | any questions so far is anybody doing it if not you have access to the workshop you can do it
00:35:07.020 | at home with more time modify the code analyze it and make sure
00:35:13.500 | and the concept is understood so it got the algorithm here and it executes the thing on eroku now i'm just
00:35:23.100 | waiting for the last inference step and that execution is done there you go i found the python implementation
00:35:31.900 | from wikipedia i calculate that number that you asked me this is the python snippet from wikipedia
00:35:38.860 | and then it implemented in this code and this is how i did it these are the explanation beautiful it works
00:35:47.420 | now before i get into postgres execution we need to have access to the database and right now the
00:35:57.340 | application that you have doesn't have access to that database if you want to give it a try
00:36:03.260 | you are going to go to the heroku dashboard where is the heroku dashboard here
00:36:11.820 | to the ai engineer world's fair
00:36:19.500 | click on ai engineer data
00:36:21.980 | go to resources
00:36:27.660 | and we have one thing here the postgres database tools only work on followers followers on eroku are read
00:36:39.820 | only for security reasons we don't want to give an llm tool right access to your database because of course
00:36:48.940 | uh llm's are not trusted for that if you want to give right access to your production database to an
00:36:56.140 | agent i will invite you to deploy an mcp that does that like that postgresql mcp give the access under your
00:37:03.900 | own risk the ones we maintain we want to make sure we don't like break production so we have two different
00:37:11.340 | databases here the master one or the main one is database don't touch that one the other one that
00:37:18.620 | has attachments is the follower so you are going to expand these manage attachments and add your application
00:37:29.980 | like julian jupiter and that's it now i gave my application access to the database
00:37:40.140 | second you will need to get the name of that database so go to your application
00:37:48.860 | in this case is julian jupiter
00:37:51.020 | go to resources
00:37:55.020 | and this database here
00:37:58.940 | ai engineer data
00:38:06.060 | in my application is called database so that's the name i'm going to use
00:38:12.860 | in code since i am working on ai ai engineer workshop the name is different so i'm going to show you so
00:38:23.980 | you can recognize the differences in names this is just the name of the environment variable on heroku
00:38:30.300 | that has the connection string so here i have access to that database on this application my database is
00:38:39.180 | called heroku postgreSQL aqua so that's the name you are going to get and change it in your code
00:38:46.300 | heroku postgreSQL aqua
00:38:49.820 | so now i have access to that specific database on my app i cannot access databases from other
00:38:57.500 | applications this is for security reasons so this is why i have to do the attachment first
00:39:02.220 | to be able to give permissions to my database if the database leaves in your application
00:39:08.300 | you don't need to do to do this because your application already has access to it that's
00:39:13.100 | the only thing that we need to do here to to set up then we are going to enable two tools
00:39:20.860 | postgres get a schema and postgres run query these are going to run on my application remember
00:39:28.620 | the target app name and these are going to access that specific database so i'm giving permission to go to
00:39:37.020 | those two places so the database that we have here is a database for a solar energy company it contains
00:39:45.660 | a table full of metrics with energy consumed and energy produced every hour in kilowatts per hour
00:39:54.700 | and we have metrics for maybe two months so now what type of applications can we build with this
00:40:02.940 | like asking questions like how much energy has been saved in the last 30 days it's going to
00:40:09.740 | understand the shape of the database it will see that it has a metrics table with these columns
00:40:16.220 | then it will go and generate an sql query to be able to get this information and then run that query and
00:40:24.700 | give you the response and give you the response and let's execute that to see how it works
00:40:33.820 | so first step is i don't understand your database let me get the schema
00:40:39.340 | so to get the schema it executes the tool
00:40:44.460 | and then you are going to see the full schema being printed here on the screen in a moment so now we have
00:40:50.940 | the schema of my database it see the different tables metrics with the fields products systems users etc so i
00:41:02.780 | have the information that i need here beautiful now it will run the the query so it already generated the
00:41:11.260 | query that it is a hundred percent sure will run on my database because it knows the shape
00:41:17.740 | this is the response it got the data and now that after it got the data it's going to perform the
00:41:24.460 | inference it's going to extract the answer from that data and at the end it's going to tell me the information
00:41:31.020 | it wanted to run more queries to get a breakdown of the energy by system i have three different systems
00:41:39.260 | one system that performs well the other one that doesn't perform pretty good the other one that it's
00:41:44.940 | totally horrible so it is also trying to get more information just from one one question it is doing
00:41:51.340 | inference on my data and i just enabled two tools i didn't need to do anything else and there we go
00:41:59.580 | the report these are the key metrics this is how much you have saved and this is the breakdown per
00:42:06.780 | system the best performer good performance and the energy deficit and this is getting access to your data
00:42:14.300 | and acting over your data we can add a third tool let's say code execution python i want to generate a
00:42:21.820 | graphic so it will now do a matplotlib graphic with that data and you can keep adding more more things
00:42:31.180 | but also the more agents and tools that you add it will take more time to perform the whole operation
00:42:37.820 | but that's an example how can you mix and match these tools that we have here and how can you give access to
00:42:45.340 | agents to your data i have a couple of exercises just try to come up with an example to use a pdf and extract
00:42:54.380 | something from that pdf and run code and you can do this on your own time we are running low on code
00:43:00.540 | i want to show you now how can we deploy and run mcps with this same endpoint so those are the tools we
00:43:11.020 | maintain but what about the tools you are building and the tools that already exist on the mcp ecosystem
00:43:19.900 | so with manage inference and agents uh let's say you go to the dashboard to the configuration page
00:43:28.300 | and here you get access to the model configuration and here it has the toolkit integration and the mcp server list
00:43:41.500 | so basically this mcp brave that i have here the brave search mcp and you might have already have
00:43:49.500 | access to this so you can attach it to your own jupyter notebook pretty much to attach an mcp you
00:43:55.740 | click on manage mcp servers you attach it as an application similar to what we did with the database
00:44:02.780 | it's just another local application and there you go we have an mcp server here that is exposing to tools
00:44:11.500 | web search and local search so now we can use these tools on the agents endpoint so let me go
00:44:19.420 | and take a look at that so these are the instructions step by step how can you enable an mcp
00:44:24.780 | then it's just another tool this time it is not a heroku tool it is just an mcp and this is the name of
00:44:35.100 | the tool that i will execute mcp brave
00:44:39.100 | is the name of the name space you can have multiple mcps here so this is kind of like a like a namespace for
00:44:51.100 | your mcps and i am going to execute brave web search this requires an api key and i already have the api key
00:45:02.140 | on my application if you want you can go ahead and take it i'm going to like remove that api key in a moment
00:45:07.820 | don't perform like a 2000 queries otherwise i'm going to get charged but now i have access to it and the
00:45:16.780 | security to also enable those those keys to your mcps and i will execute it as another tool
00:45:24.220 | and the prompt that i'm sending is what is the most recent news about the ai agents let's execute this
00:45:30.220 | and this mcp is running on eroku the same thing as code execution or the postgres tools it will spin off a
00:45:41.660 | dyno run the mcp as a standard input and output and scale back to zero so you are not paying for
00:45:49.100 | something that is constantly running and there we go that's the search from the brave web search
00:45:56.860 | so this is the response from the tool and now it's going to render that after the inference operation
00:46:06.940 | so i'm running mcps on eroku a quick example how can you deploy an mcp
00:46:14.540 | i have an exercise here deploy an mcp to eroku i have a perplexity ask mcp that i just forked
00:46:24.620 | this is the official perplexity mcp i fork it to my repository to just do one thing to make it eroku
00:46:33.660 | compatible you use the proc file which is the file on eroku that defines how an application is executed
00:46:42.940 | and i added the entry point as an mcp so this is a new process type we support everything that
00:46:49.420 | starts with mcp is going to be recognized as an mcp on eroku it will execute this standard input and
00:46:56.940 | output code it also requires an api key that i don't have but you can deploy it to deploy to eroku i added a
00:47:05.980 | button to make it super easy click to deploy i deploy this mcp to my space and then i attach it to my
00:47:14.060 | application this is my perplex city mcp let's add it available to the space
00:47:27.020 | i don't have one i don't have one i deploy there are certain mcps if you don't have an api key it
00:47:33.900 | fails when you run them so this is why i'm specifying something it doesn't need to be valid
00:47:39.100 | and this is now a node.js application it will just deploy this build this and keep it available under
00:47:45.500 | eroku and you will be able to attach it to your to your app so this is an exercise you can do and
00:47:50.940 | write the code to run that that mcp that you just deployed
00:47:55.580 | now we got the response from the previous brave search
00:48:03.340 | and this is pretty much the most recent news about ai agents but remember that i told you
00:48:10.060 | that you can also use those mcps outside of eroku not only for eroku agents let's say you are using
00:48:18.300 | cursor or cloud desktop or you are writing your own agent in a different platform but you want to have
00:48:25.340 | those mcps available remotely you can use those two so here on the management dashboard you see the
00:48:33.740 | toolkit integration page so all of the mcps that i deploy to my app are going to be available through this
00:48:41.340 | endpoint so this is a server sent event endpoint and it is authenticated behind a better token and
00:48:49.820 | we are working on all our support so that you can run it securely you can build remote mcp servers
00:48:57.340 | on heroku that are accessible without a better tokens we don't like better tokens
00:49:01.820 | perfect so now that the mcp is there let's add like my my perplexity really quick let's refresh
00:49:09.740 | i need it perfect i have it it's available i go to my toolkit copy the token
00:49:19.340 | it is the same the same token that you use for inference so i don't need to do this
00:49:23.740 | you just need the url and in my jupyter code i have a basic mcp client i'm using
00:49:31.020 | so i need to install the dependency here i'm using the mcp package from uh from anthropic
00:49:42.380 | and i'm creating an mcp client that mcp client is going to my heroku endpoint passing that api key as a
00:49:51.580 | authorization header and let's run it to list the tools that are available and then execute
00:50:00.060 | brave web search so i'm executing the mcp that i deploy in heroku outside of heroku
00:50:08.380 | let's run this demo
00:50:09.900 | and it connected we have the following tools brave local search brave web search and perplexity ask
00:50:21.340 | the one that we just deployed and enabled with just two clicks
00:50:24.540 | mostly i need executed the web search i am not processing the result here this is just a very basic
00:50:32.460 | example but that's how can you deploy on mcp and use it externally and i think and i i have also like
00:50:39.420 | a course or uh mcp gateway that i use for mcps that i deploy i just use the same approach to use
00:50:47.500 | my mcps on cursor and last but not least i told you that the at least the chat completion standpoint is
00:50:57.020 | compatible with open ai api 95 we are working to bring it to 99 that like certain parameters that are
00:51:04.860 | not supported but you can just use the sdk so let's use the sdk to perform a basic operation i am just
00:51:13.900 | using the open ai sdk with the api key and url from eroku i perform an inference
00:51:22.060 | and we should get a response
00:51:27.500 | in a moment
00:51:33.900 | and there you go we have the response and that's pretty much what we had for you today you get
00:51:41.900 | access to this notebook keep playing with it i extended access to to the heroku platform that team
00:51:48.860 | until the weekend because right now unfortunately we don't have a free tier we are working hard to bring
00:51:54.780 | it back but you can go deploy try it out and if you have any questions please
00:52:03.260 | connect with us on social we have the dev center the site for documentation the heroku ai website
00:52:14.540 | we created a heroku community on twitter or x these slides are on the slack so you can get them from
00:52:22.460 | you can get them from the slack and thank you very much i hope you enjoyed this worship