Building Agentic Applications w/ Heroku Managed Inference and Agents

welcome welcome you all to this launch and learn I hope you are enjoying lunch right now that the conference brought you we are going to be talking about our recent product that we announced at Heroku which is the Heroku Manage Inference and Agents or part of the Heroku AI offering and we will be talking about the fundamentals of building authentic applications with this service this is not a session this is a workshop so if you want to follow along just bring your laptop you only need a browser you are not going to install anything in your computer I swear that is going to be easy to follow along also this content is going to be available for you if you want to continue learning at home and having access to the Heroku platform during the event and the weekend so initially we are going to do the setup while you get ready by signing up to an Heroku account or if you already have an Heroku account you can just get access into the platform I will give you a link for you to follow then my friend here is going to give you an overview of what we release and then we will go and do the hands-on my name is Julian Duque I'm a principal developer advocate for Heroku I'm here I am with Anoush yeah hi my name is Anoush I work as a product manager for Heroku AI thank you everyone for joining us very excited to talk to you guys as well as walk you through how simple Heroku is of course and let's get the started you can join the workshop Heroku AI channel in the AI engineer slack workspace there I share the slides and some links also this is the QR code that is going to take you through the workshop side so in the workshop side you are going to find a link that has the content and a form when you enter your Heroku account email if you already have an Heroku account just enter your email that you use to sign up to Heroku if you don't have an Heroku account go ahead sign up for one you don't need to put like any credit card information and with that form you are going to get access to Heroku you will be able to deploy applications use our services during the duration of this week if you have any problems with the sign up process let us know we are here to help and this is a follow along workshop what are we going to do is deploy a Jupyter notebook to Heroku and then we are going to load all of the workshop contents into this Jupyter notebook and from there I will take you with the workshop material so let's take a picture of this also make sure you join the slack channel there is the information in there and while you are doing the setup my friend Anush here is going to give you an overview of Heroku AI and what we release okay thank you for joining us friends how many of you are familiar with Heroku have you used Heroku before show of hands oh nice so to set the stage this is the most exciting time to be building right now and especially building with AI and this point has come before right there have been inflection points of technology that have fundamentally changed the way we build things there was the internet there was the cloud there were web apps and now there's AI previously when Heroku started we took a similar challenge people wanted to build and deploy their apps especially with Ruby on rails but it was hard it was hard to operate it it was hard to deploy it it was hard to scale it Heroku made that super simple with get push Heroku main and a whole new host of developers could easily build push and scale their apps we are doing a similar thing right now where we have taken on the challenge with AI we have seen that people want to build with AI they want to build agentic applications they want to build agents that scale and operate in a way that is very simple we want to make sure that every software engineer right now is an AI engineer and it is simple as attaching agents and AI to your apps so how are you doing that currently a lot of solutions are only for the day one problems but what happens on day two how do you operate it how do you scale it there are so many models out there how do you know this is the right model for your problem how do you know your tools are running safely Heroku has taken a very opinionated and curated sort of models that we believe work best for our customers and developers will enjoy we have expanded that folder by deeply entrenching these models in an agentic control loop that runs on Heroku that has access to tools like code execution access to your data all under the trust layer of Heroku and for these extension of agents we are using the model context protocol you might have seen online that people keep asking that who's going to build the Heroku of AI or who this is the Heroku of x so the question for who's building the Heroku of AI it's Heroku of course why wouldn't we okay so what are the challenges right now people are facing one of the things that i can see is that how do you how do you figure out that this model works best for you how do you know that it's evaluated and traced and has the right technologies to make sure that it is performing the way you want it to perform so these are the challenges that we're taking on we are curating these things such that whenever you work with the agentic applications you don't feel like it's a bunch of knobs and bells and whistles like a plane but it's actually pretty simple by taking on the opinionated approach of having those defaults for you you can get started with as simple as one cli command you just do Heroku AI models create and that attaches AI to your app as a resource moving on to the next slide of course okay so we have three major things that work great for our AI application or building agentic applications one is we offer primitives like inference where you can curate and take a set of models and access this in your apps we also have model context protocol to extend your apps you can build remote mcp servers on Heroku simplified way and you can also build standard i/o mcp servers that run in Heroku's trusted compute and they can also scale to zero so that you do not pay for things that you're not using and we also have had pg vector which is a great vector database for embeddings so all these together give you all the primitives that you need to build agentic applications so how do we do this Heroku has a trusted compute layer called dynos and these computers were the one that runs tools for you so example we provide first party tools like code execution that run on Heroku's compute and these can run on the compute and stream the data back to you and solve your problems we plan to offer our additional first party tools such as web source for grounding memory memory is really important right now so in the future we'll probably offer memory and other compute tools that can run on Heroku's compute and provide to you you can bring your own tools as well using mcp that can run on our compute and stream things back to agents okay with that i'll hand over to julian for a more hands-on workshop and you can build all the things i spoke about within the next 30 minutes okay let's go back to this last uh picture for the folks that just joined so you get access to the workshop content and remember we have the slack channel in the ai.engineer workspace where i already shared these slides and the link so you can follow along you can follow along with me right now or you can follow along at home with more time so what are we going to do is the following so let me show you that link is going to take you to this page in this page you have access to the workshop resources this is a website that has the instructions step by step of what you are going to be doing today i'm going to be following these same steps right here right now and second we have a way for you to get access to eroku if you already have an eroku account just put your email the email that you use to log into eroku if you don't have one go and sign up first you can sign up for eroku you don't need to enter your credit card information with just this invitation you are going to get access to the workshop these services are going to be enabled until uh the 7th like this weekend and i can extend it if you like the content we have for you this is going to send you an invitation to a team the the team is called ai engineer world's fair go accept that invitation and then you have access to this dashboard so from the heroku dashboard you go to dashboard.heroku.com you log in and you should get access to this view we already have like some pre-deployed applications here the workshop that i will be showing you the jupyter notebook the brave search mcp we are going to use a pre-existing mcp to give you an example of how to call that mcp from the manage inference and agents endpoint and i also have another application here which is the ai engineer data that has an eroku postgreSQL database and we are going to see how can we build agents that has access to this database as well but we start the workshop first by deploying an eroku application so we have this eroku jupyter template that you can deploy to eroku go to the repository after you have the account and logged into eroku just click to deploy click to deploy you are getting this page that will ask you for an application name make sure that this application name is unique and the internet situation is a little bit slow let's hope it gets better to show you the things working live otherwise i i need to start getting access somewhere if you join this please go to the slack channel it's called workshop eroku eroku ai workshop and in the ai engineer slack channel you can find the how to get started instructions if you guys are stuck somewhere or you don't know how to get started please raise your hands and i can come help you out beautiful this is the deployment page we are going to create a new application remember to have a like a unique name let's do this is jdukey jupiter this is going to be my jupiter it is not unique so if it is not unique you are going to get an error so let's do workshop to get a unique and for the app owner if you want this workshop to work without you paying anything make sure to select the ai engineer warfare which is the team we have invited you to these jupiter notebooks are password protected so there is an environment variable here for you to define the password to this workshop i'm going to use a super secure password lab and then click to deploy so this is going to deploy the application with a dino where the jupiter notebook is going to run the dino is pretty much the container unit where your application runs like the virtual machine on heroku and it also has heroku postgres so all the work that you are going to do in the jupiter notebook is going to get persisted into this database and this is going to take a little bit of time to to install this is pretty much fetching all the source code of this jupiter notebook template to heroku building that application and all of the dependencies and then you have these available for you to use so basically you are going to get something like this i already have this deployed but you don't have any workshop here so we are going to be loading that notebook and to load that notebook you go and keep following the instructions oh the third step that i missed is how can we provision managed inference may mainly how can we produce provision an ai model to an eroku application so manage inference is a service that let you run ai models within the same infrastructure where where your application is running so the data is not going to third parties if you are using let's say like open ai or anthropic apis the data is going outside your application this is keeping uh everything running inside the same network that your application is running and to provision these you can use the heroku cli but we are going to make things easy once the application is deployed this is taking time to deploy so let's go to something that it's already deployed let's say this one that i i already created before you go to your application in the dashboard you click to the management page then you go to resources and in resources is where you can provision add-ons add-ons like heroku postgres or the key value store or any other third party item that we support on our marketplace the one that we are going to provision is called heroku manage inference and agents this is our ai offering so you are going to click on this add-on and then you will select the model that you want to use in your application we support text-to-text models from entropic cloud 3.5 3.7 and 4 we support coherent bed for embeddings and we support a stable image ultra for image generation we are going to be working mostly with text-to-text with inference models so let's do cloud 4.

we submit and that's it now my application can access ai services that's the only thing that you need to do and how this works this is going to enable your application to have access to environment variables that contain the api url the api key and the model id and with those three things you can use like an existing sdk that supports the open ai specification or the open ai api or you can perform like an http request or build your own solution to run these applications so now we provision ai into our app now our another app is deployed so we can do the same with that so i'm going to keep following then let's go and load the workshop so i'm going to be copying this url that's the jupiter notebook we are going to load on our jupiter template that we deploy and here you go to file open from url you paste that url click on open and voila we have our notebook ready to start so remember that we provision the manage inference and agent service into our application that is going to give us three environment variables if i go here to settings reveal config bars we see the environment variables there the inference key the model id and the inference url the api key that we will use to call the service so obviously the first step that we will do is to load those environment variables on our jupiter environment so we will do this workshop in four parts the first one is set up the environment we are going to just load those environment variables so we can continue doing our activities second we are performing manage inference this is your basic chat completions endpoint and we will see two examples one doing like basic chat completions and the other one doing a streaming version of those chat completions endpoint the third which is one of my favorite ones is the heroku tools or the agents part we have another endpoint that lets you run agents that we support natively we have code execution agents we have database access agents document conversion execute commands on a dyno and we are going to continue adding more to those and this is just an endpoint that performs the execution of the agent in the heroku infrastructure in our dynos and it is just an one of dyno what is a one of dyno it just spin ups runs the code and scales back to zero so you are not paying for an application that is constantly running you are just paying for the amount of compute that that tool took to execute and that endpoint also support mcps mcps that you can deploy to heroku and then attach them to your a inference agent endpoint and last but not least we are going to take a look at that mcp support that we have how can you deploy an mcp to heroku attach that mcp to the agent's endpoint and also use those mcps externally remotely through the mcp gateway or mcp toolkits that we have i know it sounds like a lot but it's going to be easy i have a couple of exercises for you if you want to write some code but basically what i would be doing here is just like running the code that i already have implemented here a good friend thank you for coming okay so first thing i told you we need to load these three environment variables inference ull inference key and inference model id and there is a fourth one that we will use for certain uh tool execution on heroku which is the target application name so what are going to put you here in target application name the name of the application you deploy the name of the application the hero the the heroku jupiter application that you deploy in this case my worship my jupiter notebook is called ai engineer workshop this is the one that i'm going to use here i am giving permission to my tool to run commands and perform compute operations on this dyno on this application so here you just replace to the name of your application so you can do those examples and later i will show you also how can you give access to the application that has the database to run the other examples that we have for you so let's load environment variables now the environment variables are ready so i can continue with the examples then let's go with manage inference i mentioned that for manage inference we have the basic chat completions endpoint this is an endpoint that you will find on services like open ai or cloud anthropic basically you ask a question you send a message or an array of messages and you are going to get a response this endpoint also supports custom tool execution but this tool execution is something that is happening on your code so you need to specify the function and capture like the information to execute those or if you are using an agent framework like adk or lang chain it also has like custom tool execution internally but for the example we are going to perform a basic inference to perform the basic inference we are we are going to do a basic http request so we have the inference key the model id on the payload so we are going to use cloud 4 which is the model that i provision here on my app and the message that i'm going to send as an user is explain the context the concept of manage inference inference in one sentence you can adjust the parameters that the open ai api supports like temperature maximum tokens top p etc you can take a look at the documentation by clicking on the path endpoint and it will take you to the documentation of that endpoint and a lot of the parameters we support and then after performing the http request we get the response as json and let's run this and let's see how this works so we get a response like any other ai endpoint works so we have a chat completions object with the different choices like the amount of messages that it returns basically we have just one message here as an assistant and the response is manage inference is a cloud service that handles the deployment scaling and maintenance of matching learning models for real-time predictions etc so that's a basic example so this is something that you can do everywhere getting inference it became a commodity and also to show that we also support the streaming of course we want to have like a better experience while building our applications we need to support streaming meaning that it's not going to return the whole answer as just one json object it's going to start giving your chunks so you can get that real-time feedback when getting the response from the infinite service so here i'm just consuming chunks and printing out those chunks on the s on the screen the first sample is going just to show you the raw version of this request so this is using a server send event approach is sending you message events every message events has data and each data is a piece of that that completion object but in this case you are getting like a like a delta like a version of that message so you see here in the content we get nothing then a streaming is crucial for this and then it continues giving me the response we get a bunch of different responses but let's take a look how that looks like on an application when you are rendering the chunks this is just the pure objects to understand how the api works here i have another example where i am doing some parsing of that information extracting the delta content of each response and then at the end rendering everything as a beautiful markdown when i execute this the inference service is going to start responding on real time chunk by chunk all the messages that i'm getting from the service as you can see it is a markdown response so at the end when we finish we use the markdown display object here in jupiter to get the beautiful answer here sometimes it will give you like a code example depending of what the inference service is giving you and that's how this works and how this is how you can build an application to consume these real-time streams but now let's talk about the good part which is the agents who has built an agent before like an application that executes a tool or run run some code perfect we have like a couple hands there amazing so the tools that we support today an anush here as product manager can tell us what we are going to be supporting in the future uh dino run command dino run command allows you to execute a command on a dino like a unix command or a script that you are do you already have pre-deployed on your application it will execute that and give you the response so this is pretty good to run trusted code code that you have written that you know that it works that it has a predictive predictive result and you can just have that code running on your dino and execute it through this a this agent this tool we have query databases we have two tools here one is postgres get a schema and the other one is postgres run query postgres get a schema mean the llm doesn't know the shape of your database if you ask for data it will hallucinate an sql query or if you have some sort of retrieval augmented generation or if you are grounding the prompt with the shape of your database it will generate something that it's close to the shape of the data of your database but with this tool it will get exactly what you have on the on the database and then we can pass that schema to the next tool which is run query it will generate a query that will run on your database get the result and make inference over those data then we have a couple other tools for document transformation html to markdown pdf to markdown you pass an url like a website or a url of a pdf that is hosted somewhere and it will give you the markdown response so the inference can work over that text content it's going to perform that text extraction for you and my favorites are the code execution one we support for today python node ruby and go so the llm will generate code that will run on eroku on a one of dino and it will return that response back to the inference service and these also support dependency installation so the input parameter has like a packages array parameter where the llm automatically say okay my python script is going to use pandas or numpy let's install it before i attempt to execute the tool and all of these are just mcps that we natively support but you can also extend this agent endpoint by deploying your own your own mcps so now let's take a look the agent's endpoint is different to the chat completions it works similarly with some minor differences for example all of the responses are a stream so it doesn't support like the synchronous call where it waits to give you like like the full response because code execution access to database and this tool execution take time so we prefer to have it like a stream and then depending on the tool you are using you might need to give access to the tool to your database or to an application for example to execute a command so let's try that first one let's run run a demo to run a command on an eroku dyno so here we are setting up the payload similar to chat completions we have the model id we have the message we are asking what is the current date and time on the server llms doesn't know anything about like the real time they need to use tools to be able to have context about what's going on right now i have asked the llm about like information and it gives me like information from 2024 because that's the date when they stop training the model but with this it knows exactly what time is because it's running this command on the server so the tool is an eroku tool so i'm specifying the type to run this on eroku the tool that i'm executing is the dino run command and then the parameters for this tool are okay the target app name remember i told you use the name of the application you deploy and then the command to run is date and a description of pretty much what's going on with the with the output of this command and i'm calling it i'm just calling the agents eroku endpoint i'm passing that payload that has the tool this is also a stream endpoint so i'm having the the stream output here and i'm just extracting the tool calls and this is a standard also shape like api shape that other api support for tool calling but this is specifically running the tools on eroku so let's execute and see the example and i'm hoping you can execute this tool in your in your jupiter notebook so it is running this on my ai engineer workshop application is going to execute the dino run command so now it is waiting until it runs the command of my app we get the the date and time and then the inference response cool i mean it's a basic example but powerful instead of running date imagine if you deploy your own screen that does your data extraction or connects to external services you can just call it call it from here from this endpoint as well the other one code execution with code execution i am asking the inference service to perform a operation and i am passing that code execution tool so now that it has that tool enabled it will try to generate code to generate code in that language run it on eroku and give you the response back for this example i will invite you to change node to python ruby or go to see different responses i'm a node developer even though i have a bunch of python here some of you might recognize this is llm generated so sorry not sorry that's what tools are for and i'm going to be executing that a code exec node to perform this operation what is the 30th fibonacci number this is a basic algorithm i just wanted to do something easy i will invite you to change this to perform a different operation and see what you can do like break this thing like break this thing and then for the execution i am just parsing the response so i get like beautiful markdown code highlighting and everything so let's perform the operation and it will generate the algorithm let's see okay i will execute code exec node this is the code that i'm passing as the input this is the beautiful markdown syntax highlight code that it will execute that's javascript yes that's javascript and at the end it's going to execute that code on eroku and then i am getting the response back the fibonacci number is 233 but i want to do that in go so just change from node from node to go execute this and it will do the same but now the code that we are going to see here is a is a go code and it will do the same get this go code on an eroku dino compile it run it and get you the response back that looks like a go code to me i am not not a go developer but that definitely looks like go and it executed the response and gave me the same thing and an explanation like the inference operation over that tool execution but that's cool so far we have just called one tool the good thing about the agents is that we can chain calls together mix and match we have like different agents acting together so now let's complicate things a little bit more and we are going to use two tools one is the html to markdown so go to a website do something and then get that result and use it on this other agent for that we are going to use two tools html to markdown and code execution python so the prompt here is use the python snippet from the wikipedia page for euclidean algorithm to calculate the common divisor of 200 252 and 105.

so it doesn't know the algorithm i am entailing directly go fetch the exact one that is on wikipedia then run this code on eroku and give me the response i'm enabling those two tools html to markdown and code execution python and let's run this and now we are going to see multiple tool calling getting the page in markdown so now it's recognized that the euclidean algorithm page is this one now it is reading the whole content of that page in markdown attempting to get the algorithm from there then it's going to generate the code then it's going to run the code and at the end i'm going to get the response this execution takes a little bit more time because it is performing multiple tool runs at the same time any questions so far is anybody doing it if not you have access to the workshop you can do it at home with more time modify the code analyze it and make sure and the concept is understood so it got the algorithm here and it executes the thing on eroku now i'm just waiting for the last inference step and that execution is done there you go i found the python implementation from wikipedia i calculate that number that you asked me this is the python snippet from wikipedia and then it implemented in this code and this is how i did it these are the explanation beautiful it works now before i get into postgres execution we need to have access to the database and right now the application that you have doesn't have access to that database if you want to give it a try you are going to go to the heroku dashboard where is the heroku dashboard here to the ai engineer world's fair click on ai engineer data go to resources and we have one thing here the postgres database tools only work on followers followers on eroku are read only for security reasons we don't want to give an llm tool right access to your database because of course uh llm's are not trusted for that if you want to give right access to your production database to an agent i will invite you to deploy an mcp that does that like that postgresql mcp give the access under your own risk the ones we maintain we want to make sure we don't like break production so we have two different databases here the master one or the main one is database don't touch that one the other one that has attachments is the follower so you are going to expand these manage attachments and add your application like julian jupiter and that's it now i gave my application access to the database second you will need to get the name of that database so go to your application in this case is julian jupiter go to resources and this database here ai engineer data in my application is called database so that's the name i'm going to use in code since i am working on ai ai engineer workshop the name is different so i'm going to show you so you can recognize the differences in names this is just the name of the environment variable on heroku that has the connection string so here i have access to that database on this application my database is called heroku postgreSQL aqua so that's the name you are going to get and change it in your code heroku postgreSQL aqua so now i have access to that specific database on my app i cannot access databases from other applications this is for security reasons so this is why i have to do the attachment first to be able to give permissions to my database if the database leaves in your application you don't need to do to do this because your application already has access to it that's the only thing that we need to do here to to set up then we are going to enable two tools postgres get a schema and postgres run query these are going to run on my application remember the target app name and these are going to access that specific database so i'm giving permission to go to those two places so the database that we have here is a database for a solar energy company it contains a table full of metrics with energy consumed and energy produced every hour in kilowatts per hour and we have metrics for maybe two months so now what type of applications can we build with this like asking questions like how much energy has been saved in the last 30 days it's going to understand the shape of the database it will see that it has a metrics table with these columns then it will go and generate an sql query to be able to get this information and then run that query and give you the response and give you the response and let's execute that to see how it works so first step is i don't understand your database let me get the schema so to get the schema it executes the tool and then you are going to see the full schema being printed here on the screen in a moment so now we have the schema of my database it see the different tables metrics with the fields products systems users etc so i have the information that i need here beautiful now it will run the the query so it already generated the query that it is a hundred percent sure will run on my database because it knows the shape this is the response it got the data and now that after it got the data it's going to perform the inference it's going to extract the answer from that data and at the end it's going to tell me the information it wanted to run more queries to get a breakdown of the energy by system i have three different systems one system that performs well the other one that doesn't perform pretty good the other one that it's totally horrible so it is also trying to get more information just from one one question it is doing inference on my data and i just enabled two tools i didn't need to do anything else and there we go the report these are the key metrics this is how much you have saved and this is the breakdown per system the best performer good performance and the energy deficit and this is getting access to your data and acting over your data we can add a third tool let's say code execution python i want to generate a graphic so it will now do a matplotlib graphic with that data and you can keep adding more more things but also the more agents and tools that you add it will take more time to perform the whole operation but that's an example how can you mix and match these tools that we have here and how can you give access to agents to your data i have a couple of exercises just try to come up with an example to use a pdf and extract something from that pdf and run code and you can do this on your own time we are running low on code i want to show you now how can we deploy and run mcps with this same endpoint so those are the tools we maintain but what about the tools you are building and the tools that already exist on the mcp ecosystem so with manage inference and agents uh let's say you go to the dashboard to the configuration page and here you get access to the model configuration and here it has the toolkit integration and the mcp server list so basically this mcp brave that i have here the brave search mcp and you might have already have access to this so you can attach it to your own jupyter notebook pretty much to attach an mcp you click on manage mcp servers you attach it as an application similar to what we did with the database it's just another local application and there you go we have an mcp server here that is exposing to tools web search and local search so now we can use these tools on the agents endpoint so let me go and take a look at that so these are the instructions step by step how can you enable an mcp then it's just another tool this time it is not a heroku tool it is just an mcp and this is the name of the tool that i will execute mcp brave is the name of the name space you can have multiple mcps here so this is kind of like a like a namespace for your mcps and i am going to execute brave web search this requires an api key and i already have the api key on my application if you want you can go ahead and take it i'm going to like remove that api key in a moment don't perform like a 2000 queries otherwise i'm going to get charged but now i have access to it and the security to also enable those those keys to your mcps and i will execute it as another tool and the prompt that i'm sending is what is the most recent news about the ai agents let's execute this tool and this mcp is running on eroku the same thing as code execution or the postgres tools it will spin off a dyno run the mcp as a standard input and output and scale back to zero so you are not paying for something that is constantly running and there we go that's the search from the brave web search so this is the response from the tool and now it's going to render that after the inference operation so i'm running mcps on eroku a quick example how can you deploy an mcp i have an exercise here deploy an mcp to eroku i have a perplexity ask mcp that i just forked this is the official perplexity mcp i fork it to my repository to just do one thing to make it eroku compatible you use the proc file which is the file on eroku that defines how an application is executed and i added the entry point as an mcp so this is a new process type we support everything that starts with mcp is going to be recognized as an mcp on eroku it will execute this standard input and output code it also requires an api key that i don't have but you can deploy it to deploy to eroku i added a button to make it super easy click to deploy i deploy this mcp to my space and then i attach it to my application this is my perplex city mcp let's add it available to the space i don't have one i don't have one i deploy there are certain mcps if you don't have an api key it fails when you run them so this is why i'm specifying something it doesn't need to be valid and this is now a node.js application it will just deploy this build this and keep it available under eroku and you will be able to attach it to your to your app so this is an exercise you can do and write the code to run that that mcp that you just deployed now we got the response from the previous brave search and this is pretty much the most recent news about ai agents but remember that i told you that you can also use those mcps outside of eroku not only for eroku agents let's say you are using cursor or cloud desktop or you are writing your own agent in a different platform but you want to have those mcps available remotely you can use those two so here on the management dashboard you see the toolkit integration page so all of the mcps that i deploy to my app are going to be available through this endpoint so this is a server sent event endpoint and it is authenticated behind a better token and we are working on all our support so that you can run it securely you can build remote mcp servers on heroku that are accessible without a better tokens we don't like better tokens perfect so now that the mcp is there let's add like my my perplexity really quick let's refresh i need it perfect i have it it's available i go to my toolkit copy the token it is the same the same token that you use for inference so i don't need to do this you just need the url and in my jupyter code i have a basic mcp client i'm using so i need to install the dependency here i'm using the mcp package from uh from anthropic and i'm creating an mcp client that mcp client is going to my heroku endpoint passing that api key as a authorization header and let's run it to list the tools that are available and then execute brave web search so i'm executing the mcp that i deploy in heroku outside of heroku let's run this demo and it connected we have the following tools brave local search brave web search and perplexity ask the one that we just deployed and enabled with just two clicks mostly i need executed the web search i am not processing the result here this is just a very basic example but that's how can you deploy on mcp and use it externally and i think and i i have also like a course or uh mcp gateway that i use for mcps that i deploy i just use the same approach to use my mcps on cursor and last but not least i told you that the at least the chat completion standpoint is compatible with open ai api 95 we are working to bring it to 99 that like certain parameters that are not supported but you can just use the sdk so let's use the sdk to perform a basic operation i am just using the open ai sdk with the api key and url from eroku i perform an inference and we should get a response in a moment and there you go we have the response and that's pretty much what we had for you today you get access to this notebook keep playing with it i extended access to to the heroku platform that team until the weekend because right now unfortunately we don't have a free tier we are working hard to bring it back but you can go deploy try it out and if you have any questions please connect with us on social we have the dev center the site for documentation the heroku ai website we created a heroku community on twitter or x these slides are on the slack so you can get them from you can get them from the slack and thank you very much i hope you enjoyed this worship

Building Agentic Applications w/ Heroku Managed Inference and Agents — Julián Duque & Anush Dsouza

Chapters

Transcript