How agents will unlock the $500B promise of AI

00:00:00.000 | .

00:00:01.000 | Yes, my name is Donald.

00:00:16.340 | I lead the new product teams at Retool.

00:00:20.360 | Retool made its name in the earlier days

00:00:23.120 | working on internal tools, making it really

00:00:24.840 | easy for any business out there to build internal applications.

00:00:28.520 | And we've been making it easy to connect with AI providers

00:00:32.140 | for a couple years now, but we're now

00:00:34.520 | breaking into agentic AI with the release of Retool Agents,

00:00:37.960 | which we announced last week and made available to our customers.

00:00:41.840 | So half a trillion dollars has been spent on AI infrastructure,

00:00:47.280 | and yet most large companies are really just still

00:00:50.840 | stuck with toy chat bots and messing around with code

00:00:54.760 | generation.

00:00:55.560 | So let's talk about why that changes this year with enterprises

00:00:59.180 | finally being able to build agents with guardrails that plug into real production systems.

00:01:05.880 | Reuters shared last week that Anthropic hit at the end of May, so a couple days ago,

00:01:11.180 | $3 billion in annualized revenue.

00:01:14.800 | That's up from $2 billion at the end of March and $1 billion in December.

00:01:19.300 | So that's 3x in their annualized revenue growth in five months, which is some staggering growth.

00:01:24.800 | That's not to mention open AI is slated to end 2025 at $12 billion in revenue over 3x where they were at at the end of last year.

00:01:32.180 | These growth rates are massive, and this largely is fueled by enterprise AI spend.

00:01:37.800 | And coding is growing.

00:01:42.420 | Teams love using Cursor and WindSurf, including my own.

00:01:45.420 | I think every engineer on my team is using one of these tools.

00:01:48.500 | And engineers are now becoming experts in prompting and in code review and letting LLMs do the heavy lifting of a lot of day-to-day coding.

00:01:56.420 | Their workflows really are just completely transformed right now and their productivity is through the roof.

00:02:04.040 | If you look at OpenRouter, which gives access to a unified API that exposes hundreds of AI models,

00:02:12.040 | their top apps list is really dominated by code generation use cases, as you can see here.

00:02:20.040 | And the LLM providers are taking note.

00:02:23.920 | SWE Bench Verified is a benchmark that measures an AI model's ability to perform real-world coding tasks.

00:02:30.660 | If you look at GPT 4.1, it's up 21 percentage points from GPT 4.0,

00:02:37.300 | really showing the investment that OpenAI is putting behind making their models work really well for coding use cases.

00:02:44.420 | And Gemini 2.5 Pro is up another 9 percentage points from GPT 4.1.

00:02:49.520 | Devs are raving about Gemini 2.5 Pro.

00:02:52.480 | I think nearly every developer I know using Cursor is talking about how well it works.

00:02:57.480 | And finally, the term vibe coding has firmly planted itself in the zeitgeist.

00:03:03.960 | Last week on the Andreessen Horowitz podcast, Rick Rubin, the legendary music producer, said,

00:03:09.020 | "Vibe coding is the punk rock of software," talking about in the same way that punk rock,

00:03:14.640 | with its simplicity, made it really easy for anyone who had something to say to go make a song.

00:03:19.000 | Vibe coding is doing that now for anyone with an idea.

00:03:21.960 | And Vibe coding is so powerful because you just tell Cursor or Windsurf kind of the gist of what you want.

00:03:30.960 | And it goes, and it thinks, and it thinks, and it acts, and it writes that code for you.

00:03:34.960 | And this is a lot different than basic text completions or copying code from ChatGPT into your code editor.

00:03:42.960 | This is agentic AI.

00:03:43.920 | So Vibe coding needs agents to work.

00:03:48.920 | But why should we stop with this idea at just code?

00:03:53.920 | Code is testable.

00:03:54.920 | It has semantics.

00:03:55.920 | It's easy to validate and understand if the LLM is generating it correctly.

00:03:59.920 | But could we apply the same idea to any problem in our business?

00:04:04.880 | And to do that, we would need general-purpose agents.

00:04:07.880 | And building the agent, believe it or not, I would say is actually the easy part.

00:04:13.880 | You could build a really basic agent in about 100 lines of JavaScript or Python at the start of one right here.

00:04:18.880 | And what I'm talking about here is using the React framework, which is basically a framework for building agents that instructs the agent to reason, act, reason, act,

00:04:30.840 | until it determines that it's come up with a final answer.

00:04:33.840 | And the agent has access to tools, which are basically a set of functions.

00:04:37.840 | These could be external services it's calling, code in your code base that it's running.

00:04:42.840 | So effectively, an agent is just an LLM wrapped in an execution loop that can read, decide, call tools, and self verify.

00:04:52.800 | So here you see, like I said, I have the start of a basic agent.

00:04:56.800 | I'm defining a set of tools for that agent.

00:04:58.800 | In this case, it has one.

00:04:59.800 | It's a calculator, as well as a function to actually calculate something.

00:05:03.800 | I initialize a system prompt here for the agent using the React framework.

00:05:09.800 | And I know there's a lot here, but basically what this is, is I'm defining that agent loop.

00:05:16.760 | It's a for loop, like I'm sure many of us learned in CS101, and a number of maximum iterations,

00:05:22.760 | so our agent can't get stuck in a loop thinking forever, burning up our OpenAI costs.

00:05:27.760 | The LLM tells our logic when it decides that a tool needs to be invoked.

00:05:32.760 | We call that tool, we pass the result back to the LLM, and it decides when a final answer has been reached.

00:05:39.760 | We detect that, and we spit it back out to the user.

00:05:43.720 | So, building agents is easy, right?

00:05:45.720 | We can all just go build agents at our company, and problem solved, right?

00:05:49.720 | Not so fast.

00:05:50.720 | Just like VibeCoding, agents are tough to get into production in the same way that, say, a web app that you build in Cursor

00:05:58.720 | really quickly is tough to get into production.

00:06:01.720 | You have a lot of things at a real enterprise company that you're probably concerned with here.

00:06:05.680 | Things like single sign-on, role-based access control, integrating with external services in a secure way.

00:06:12.640 | Maybe you care about audit logs.

00:06:14.640 | Maybe you care about compliance, like SOC 2.

00:06:16.640 | Maybe you use AWS Secrets Manager.

00:06:18.640 | Maybe you are a multinational corporation.

00:06:20.640 | It needs to be internationalized.

00:06:21.640 | The list goes on, and you can't always safely VibeCode these things.

00:06:26.640 | The information released an article last week on the high risks of using VibeCoded logic in production,

00:06:35.600 | and a couple real-world use cases of vulnerabilities that were put into production by developers,

00:06:41.600 | not carefully vetting AI-generated code.

00:06:46.600 | We've also learned firsthand at Retool that there is a lot that you really have to get right when you build agents.

00:06:52.560 | Models can hallucinate or give you unpredictable results or inaccurate results, made-up results.

00:06:58.560 | You have to be mindful of security.

00:07:01.560 | You have to be conscious of the things that you're giving your agent access to.

00:07:07.560 | You have to be cognizant of cost overruns.

00:07:10.560 | It can be really easy to accidentally burn up a bunch of tokens.

00:07:14.560 | Overall, evals are really an important safeguard here in making your non-deterministic agent as deterministic as you can.

00:07:24.520 | So how do you solve that problem?

00:07:27.520 | I would group the options into approximately four buckets.

00:07:31.520 | The first is to build your agent from scratch.

00:07:34.520 | You write every line of code by hand.

00:07:36.520 | Maybe you're fine-tuning LLMs.

00:07:38.520 | Maybe you have AI/ML engineers on your team.

00:07:41.520 | You have full control, but it's a high lift.

00:07:44.520 | You're building all those ancillary pieces, but what you get is something purpose-built.

00:07:49.480 | It's not outsourced.

00:07:51.480 | You have maximal control.

00:07:53.480 | Then there's more of a middle ground using a framework like, say, LangGraph.

00:07:56.480 | You still have a high level of control, for example, different memory modes.

00:07:59.480 | It's a medium lift, but a pretty flexible framework that you're tied to.

00:08:03.480 | There's agent platforms like Retool Agents where you would get opinionated defaults, low lift to production.

00:08:10.440 | Of course, you're tied to the platform, but it's useful for that long tail of business agents.

00:08:15.440 | The hosting is abstracted for you.

00:08:17.440 | Connectors to external services come out of the box.

00:08:19.480 | Observability for your fleet.

00:08:21.480 | Or the fourth bucket is the verticalized agent bucket.

00:08:25.480 | These are offerings where the agent is really dialed in for one use case.

00:08:30.480 | It can do one thing really well, but you really have minimal flexibility to kind of go beyond that one core use case.

00:08:39.480 | So how do you decide?

00:08:40.480 | Everyone wants agents, but you have to be really thoughtful about where you spend those precious engineering cycles.

00:08:45.480 | When should you hand roll an agent versus when would you want to consider a managed agent platform?

00:08:50.480 | Ultimately, I would say the decision boils down to an engineering decision of trade-offs.

00:08:55.480 | If you're working on something that's part of your core product or gives your business its competitive edge, then you probably want to build it yourself.

00:09:03.480 | If you are working with, say, regulated or sensitive data, maybe you have hard SLAs of some sort, you might want to consider both options.

00:09:11.480 | But if you're building some kind of commodity workflow and you need it in days and not quarters, then I would probably buy it.

00:09:19.480 | I would also, as a part of this, do a risk assessment of either option.

00:09:22.480 | You know, do you want your engineers debugging business logic or do you want them up at 2:00 AM trying to figure out why OAuth isn't working right?

00:09:31.480 | As a part of this decision, if you go the managed platform route, I would evaluate the breadth of connectors that the offering connects to.

00:09:40.480 | You know, are you pulling data from Salesforce and Databricks and Snowflake?

00:09:44.480 | Is that going to come out of the box or do you have to build that?

00:09:47.480 | Is permissioning built in?

00:09:49.480 | Is it compliant?

00:09:50.480 | Does it come with audit trails?

00:09:52.480 | Is observability built in?

00:09:54.480 | Are evals built in?

00:09:55.480 | Or is that another vendor that you're going to have to go now pay for?

00:09:59.480 | And I think overall, on the build versus buy decision, I would think about the token costs, the infrastructure costs, and the engineering costs that come into play for building or buying.

00:10:12.480 | On observability, this is how we think about it at Retool for agents.

00:10:16.480 | It's important overall, I would say, with whatever platform you go with to understand token usage, estimated costs, and runtime information for your agent.

00:10:25.480 | And with whatever platform you choose, you should also be able to dial into any specific agent and agent run to make sure that your fleet of agents is doing what you would expect it to.

00:10:38.480 | So, looking ahead, there's an analogy here, I would say, to how businesses today think about building versus buying software.

00:10:49.480 | Stripe, for example, is always going to have its core billing logic and its critical user-facing apps built by hand.

00:10:57.480 | But Stripe uses external platforms for that long tail of software.

00:11:02.480 | And I would expect the same for agents.

00:11:04.480 | I would expect businesses, as time goes on, to have a few hand-built agents, purpose-built for certain use cases, and then a long tail for business use cases hosted on some kind of platform.

00:11:16.480 | To look again at Stripe, they use React for much of their critical customer-facing software, and they use Retool for much of their internal tooling.

00:11:25.480 | Or you could say, look at Cursor.

00:11:26.480 | Cursor would never use a managed platform for their core product.

00:11:30.480 | You know, this is their core product that we're talking about.

00:11:32.480 | It would be slow to use a different provider.

00:11:34.480 | They wouldn't own it.

00:11:35.480 | They really need as much control as possible.

00:11:37.480 | And they have a lot of really smart engineers kind of pouring over every edge of that thing.

00:11:42.480 | But you could imagine that as Cursor, the company grows, which they are, they may eventually be dealing with a high volume of, say, fighting chargebacks against their billing provider, many customer support requests.

00:11:54.480 | I could imagine Cursor, their company, moving towards using an agent platform as they get quite large.

00:12:03.480 | I've been working closely with customers like AWS on initiatives to automate mundane business processes with AI, and I've really seen the impact here.

00:12:11.480 | Another Retool customer, ClickUp, built their AI tooling on Retool.

00:12:14.480 | They saved over $200,000 in vendor costs and hundreds of thousands of dollars on additional headcount.

00:12:20.480 | Descript estimated that they're saving hundreds of hours of work weekly with the 50 apps they built.

00:12:26.480 | And in fact, we recently announced at Retool on the topic of work automation that our customers have automated over 100 million hours of work to date.

00:12:34.480 | By doing this, we're freeing human potential for more creative and strategic endeavors.

00:12:40.480 | You know, people thought that printing press was going to lead to the decline of traditional knowledge, and in fact, it democratized the access of information.

00:12:48.480 | And I really do think that AI and agents are going to enable businesses to enhance the capabilities of their people and of their teams.

00:12:55.480 | And this is just going to unlock limitless potential and, I would say, overall, just increase the GDP of the world.

00:13:01.480 | Last week, Mary Meeker's AI trends report came out, and it was reported that inference cost is dropping dramatically.

00:13:12.480 | From 2022 to 2024, cost per token dropped 99.7%.

00:13:19.480 | And spend is huge, as we saw with Anthropix, 3x in their annualized revenue in five months, and OpenAI is $12 billion by the end of this year.

00:13:28.480 | While the marginal cost, as we can see here, is completely bottoming out.

00:13:32.480 | For example, at Retool, for our cheapest agent, we charge $3 an hour.

00:13:36.480 | You can imagine that cost is going to keep dropping.

00:13:39.480 | The Meeker report also showed that Google searches for AI agents 11xed in the last 16 months.

00:13:46.480 | So you can expect to keep hearing about agents.

00:13:48.480 | So in closing, I would say the question isn't, what is the single golden ticket way to put everything in my business on autopilot?

00:13:55.480 | It's, where can I help my engineers create the most leverage?

00:13:59.480 | And what's the right tool for the job?

00:14:01.480 | Thank you.

00:14:07.480 | I think we have two, three minutes for questions.

00:14:17.480 | Yeah?

00:14:18.480 | First of all, thank you for the talk.

00:14:19.480 | Yeah.

00:14:20.480 | That was really good.

00:14:21.480 | I was curious, like, this essentially paradigm of, for core business logic, build your own tools, whereas, you know, for more ancillary stuff, look to things like Retool agents.

00:14:34.480 | Was this like a philosophy that you guys had basically figured out, like, while working on this stuff internally to Retool?

00:14:41.480 | And if so, like, what's an example of, like, Retool's core internal logic that they want to build themselves?

00:14:48.480 | And what's something they might look to use their own product for, their own agent for?

00:14:52.480 | That's a really good question.

00:14:53.480 | I think, like, this is, like, generally a philosophy of Retool we have.

00:14:56.480 | Just, you know, we build a lot of our own internal software on Retool.

00:15:00.480 | Of course, we're, like, dogfooding as much as we can.

00:15:03.480 | In terms of your second question, I would say it's a great question.

00:15:07.480 | Agents released last week, like I said.

00:15:09.480 | So we're building as much as we can on it.

00:15:11.480 | I think it remains to be seen what we'll do on the platform and what we'll build by hand.

00:15:15.480 | I think just our philosophy is to do as much as we possibly can using our own platform.

00:15:20.480 | And if we can't do something, then we should go figure out why and go build it.

00:15:23.480 | And so I think for us specifically, I would say we're just going to use the platform itself for everything we possibly can.

00:15:29.480 | Thanks for the question.

00:15:33.480 | Hey, Donald.

00:15:34.480 | Lance from Iox.

00:15:35.480 | Hey.

00:15:36.480 | So we build applications for government and NGO and stuff.

00:15:39.480 | And I'm curious about your AI agents.

00:15:41.480 | Do you allow your on-prem offering to include the AI agents as well?

00:15:46.480 | We do.

00:15:47.480 | We do.

00:15:48.480 | So we launch cloud only, but on-prem support is coming in the next, like, week or two, maybe three.

00:15:53.480 | So, yes, it is definitely going to be supported on-prem and also eventually for our air-gapped customers as well.

00:16:00.480 | Thank you.

00:16:01.480 | Any other questions?

00:16:06.480 | Cool.

00:16:07.480 | Well, thank you, everyone.

00:16:12.480 | Thank you.

00:16:13.480 | Thank you.

00:16:14.480 | Thank you.

00:16:15.480 | Thank you.

00:16:16.480 | Thank you.

00:16:17.480 | I'll see you next time.

How agents will unlock the $500B promise of AI - Donald Hruska, Retool