AI Automation that actually works: $100M, messy data, zero surprises

- Cool, so I was actually hoping today to chat with, to kind of do this presentation along with the enterprise partner that we've been working with, but we couldn't get legal clearance in time. So just kind of imagine somebody else here, I guess. But a public healthcare company where we'll talk about kind of an automation use case and what it kind of took to make AI reliably solve a problem that would have a pretty kind of serious impact.

The context here is a large public healthcare company that specializes in, that creates software for radiologists and radiology clinics. And the idea is when operators, when patients want to schedule appointments, they call operators. And so, you know, patients will call operators and say, "Hey, I'm a male, I'm over 30, I have a symptom.

You know, I've gotten this is my health insurance provider. This is my vendor." And what operators who are on the phone do is figure out what the right procedure code for them is. They do a bunch of data entry. And they also see whether, they also try to get like the right appointment scheduled and stuff like that, right?

So that is kind of the lay of the land. In this kind of situation what was kind of observed was this call takes about 12 to 15 minutes to run through, right? And if every three minutes that you can reduce from that call has about a $50 million impact, both in terms of more calls that you can do from that center, servicing thousands of clinics, both in the U.S.

and in the world, mostly in Europe. And in this situation, the more time you can kind of shave off, the more business impact you can get because of both like the more appointments that you can schedule, but also the lower kind of capital cost of training somebody, dealing with the fact that you have to kind of retrain an operator to kind of take these calls, right?

Now, um, the, the kind of, the big challenge here is that- I had a nice slide to show you, uh, how complicated it is for the operator to schedule a call. So, what you can do right now is imagine the worst, most complicated, uh, enterprise UI screen that you have ever seen.

Like, just the worst, right? Like, uh, it's, it's like, it's got like 15 tabs, and so as you're filling data here on the call, you've got to switch a tab, go to the next tab, then fill up another screen, uh, fill up some more data there. Uh, so it's, it's a, it's a, it's a pretty kind of nightmarish situation for the operator who's trying to fill this data up.

So, the, the big kind of challenge is, is that it's, it's not just kind of navigating this UI and filling up this data, but what it kind of comes down to is you have to figure out what the right procedure code is for that patient before you schedule them in, right?

And different kinds of, uh, situations will require different kinds of medical procedure codes. So, for example, you might have, like, uh, mammogram with ultrasound urinal, or bilateral, or with, um, with an ultrasound also, uh, baked in, right? Um, but maybe they need wheelchair assistance, and so that code becomes a different, like, it becomes M-M-1-2-3-W-A, right?

Like, it, it changes, and the number of codes change. It's not just dependent on the patient that, when they come in, and the patient's kind of symptoms, and age, and gender, and stuff like that, but also whether the patient had already come to the clinic before. Uh, it also depends on state regulation, federal regulation, and local regulation.

It also depends on if the clinic decides that they don't like to work after 3 p.m. So, you know, then you can't, you can't schedule an appointment for that particular clinic. Um, the number of permutations and combinations that are required, uh, is, is kind of explosive. Uh, the last bit of it that makes it really shitty, even, is that nobody even agrees on the same set of procedure codes.

So you can't even sit and say, "Aha, this is the space of all the codes, and, and that's what we're gonna, like, analyze and have a decision tree on." Nope. Uh, different kind of clinic family comes in, and they have a different set of codes. Some people have 250 codes, uh, just for mammograms, some people have 5.

Um, and this is, I'm, I'm just, I'm gonna, I'm gonna, I've been shocked over the last few weeks, just looking at this entire space, and I'm like, "Oh ho, this is why it sucks so much." Um, and so this is kind of the situation, right? And so, the way that their stack is set up is, they have kind of three players in this, right?

So there's, there's one who's the operator, who's this, uh, person who's, uh, trying to take calls, whose, whose jobs we're trying to replace. Uh, so it's even worse than them. But, uh, but the second, the second piece, uh, is, uh, the person here is the developer who's building this very complicated software.

Because, you know, it's kind of not on them, this is just a really complicated thing to do. So it's not like, uh, I can material UI, whatever, like, uh, ShadCN this, I dated myself. Uh, you know, you can, like, you can, you can have, like, a new fancy UI, uh, to solve this problem.

It's, it's not that, it's just genuinely a really complicated problem. Um, and so they have to, will they, like, code every edge case up, right? So what would you do? You convert every case into a configuration, right? You're like, so, if the clinic has variable timings, let's create a configuration block called timing block.

Uh, and then let's figure out how, uh, what, you know, what the right kind of time for them to, uh, schedule the appointment is and stuff like that. Um, in this kind of situation, um, in these kind of three players, right, uh, you, you have this, this administrator, right, for every clinic, who's a non-technical person, who actually knows all of the clinic rules, right?

So they know, they know everything, but the problem is they can't really build it, right? So, so you have kind of this config explosion problem, or you have a, and you have a training burden, which is that every time you increase configuration, you've got to train the, uh, you've got to train the operators to do more with that configuration.

You have a lot of uncoded rules and business logic that people don't want to code up. Like, nobody wants to code up a situation of, like, this clinic in Chicago doesn't like to work on a Friday. Like, it's not a thing. Uh, um, well, I, it's probably, like, the Bay Area, I guess, but not, not Chicago.

And then, um, and then, of course, the, the, what this nets out to saying is that every time you encode a new rule, um, it's, it's, it's more expensive than the benefit that you get, so, you know, we just, we just kind of offload this training to operators, right?

And, and this leads to what I call kind of the automation paradox, right? So the automation paradox is the people who understand the rules can't code the automation, and the people who can code the automation, uh, can't understand the rules. Uh, can't, don't, won't, whatever, right? Like, they, they, developers, right?

They don't, they don't want to go out into the field and do real work. They're like, oh, let me, let me, by code something. Um, so the AI idea, unsurprisingly, this is going to be a shocker for you guys, is what if the non-tech people could write and update, uh, these algorithms in natural language, right?

Like, what does it take to make that happen, so that we can kind of cut the developer out of the loop and just have admins, Vibe code, in production, right? What does it take for us to get there, right? And that's kind of what we've been working on, uh, for the last, uh, for the last few months, uh, and I kind of want to share with you what, uh, what we've done, uh, uh, and I have, I have a live demo as well, but the internet is janky, so I'm gonna play a recorded thing that I was just doing in the corner, quickly recording it first, so that, so that you can then see it, okay, cool, all right, so the challenges, the first challenge is what I call the language problem, and the language problem is essentially, the business user has a language that seems very obvious to them, and if you give this to, like, a stock LLM, um, it will result in, you know, maybe MRI machines catching fire when it is deployed, no, it won't actually happen, thankfully, thankfully we're not doing that work yet, yet, we're just starting another project where we're getting into that, but we're staying away from the actual scanning, uh, so, so won't actually catch, catch fire, but the problem is that the LLM does not speak your business language, it speaks your programming language, so you can meta, you can say do react-y things, or do rusty things, or do JavaScript-y things, uh, or TypeScript things, but, but you can't, um, but it's kind of hard to, like, that specific terminology of what you want to do, and how it should work and translate your environment to intent is challenging.

There are two other problems that I would say are kind of non-AI problems, but just, like, uh, very important things to set up, one is the DevOps problem, what even is the SDLC for a non-technical user, right, like, what is a, what is review, staging, production, uh, fixing, uh, troubleshooting when you're a non-technical user, right, like, that's kind of weird.

Um, and the second is a security problem, which is so, uh, cool, we gave these non-technical users a way to, uh, write whatever business logic on the fly, uh, what if it causes a massive, uh, data breach, or a security leak, um, then, you know, might as well, might as well shut down.

Uh, so, so that's, that's kind of the stuff that needs to be opinionatedly, opinionatedly solved. Cool, alright, so, the solution. Now, um, the idea is, suppose you're a company called Acme, um, what, what people are kind of, what developers are doing today in this company called Acme, or for this, for this healthcare company, is, um, developers have tribal knowledge and know-how, and they use that to talk to a foundation model, uh, with assisted tooling, etc., depending on your tool of choice, and they generate, you know, uh, programs, and programs do things, right?

Instead of that, what if the non-technical user spoke to a model that was taught the language of your domain instead, and that would generate, let's call, let's call it Acme QL, uh, surprise, prompt QL, which is then coming, Acme, whatever, company QL, but, um, uh, let's say it generates a language in, it generates this, it generates a plan in a language called company QL, and this company QL plan is a program, is like a deterministic artifact that can actually be executed, so that's kind of the bridge between the business user, the AI, and what actually gets executed, uh, deterministically, and then this is, uh, run, right?

So this now is programmatically run, uh, once this work is done. So the hard part becomes, how can we encode whatever practices we have about, um, uh, everything from, uh, procedural semantics, to ontologies, to entities, to specifics, into, uh, into a model, how can we teach that to a model, so that it can then just start to generate, uh, things that make sense to a, uh, uh, to a business user, right?

Um, let me load up, uh, a demo, uh, and see what this looks like, so let's hit play here, and I want to keep my scroller, so, um, I start with kind of something simple, which is, as a business user, um, I start with something very simple that says, uh, for, can you folks see on the back?

No? All right, let me go to a past thread and show you what a conversation actually looks like, right? So, all right. So, I did this demo on GitHub to kind of take a situation where we want to dynamically reassign, uh, who gets to be a supporter when a particular GitHub issue comes in.

Kind of an equivalent problem statement that has the same kind of, like, weird business rules and logic that you need to have, uh, to actually make it work, right? So, for example, I kind of go in and I'm like, depending on this kind of business logic or these kinds of rules, um, this is the person who should be assigned, right?

So, as a business user, I start the conversation off with, uh, given, uh, issue description, something like data pipelines are not working, um, find the most relevant file, uh, using AI, uh, and, uh, and you'll have to go through a bunch of files, so then find, uh, the most relevant file, and then find the, find, like, the top contributor, right?

Straightforward, um, as long as you're connected to the data and you understand what all of these words mean, which is a big part of setting up the semantic layer, etc., whatnot. Um, and then it kind of goes and does stuff and gives me a response, right? And says, "Hey, so we identified, uh, analytics pipeline.py, this is the GitHub issue, uh, this is the, uh, sorry, that's not the repo, uh, yeah, that's the, uh, repo I'd set up for this is sample repo, right?

So it has a bunch of files, uh, and now from this bunch of files, I want you to kind of go in and say, "Hey, uh, find the right issue, find the right file, and from that file, find the top contributor," right? And so, uh, that top contributor gets found, and I actually have confidence that, "Okay, this is sort of working.

I've actually done this work myself." The next thing we do is introduce a primitive called automations. So automations is now primitive in this, uh, ACME QL, right? So I say, "Convert this to an automation." Um, and this is the only technical thing I deal with ever as a business user.

I say, "Input has a field name called description, and output should have a field called, uh, name. That's it. I kind of don't care. There's an input and there's an output, and I'm gonna stop thinking about what my business logic does." It then goes ahead, converts it into an automation, runs a bunch of tests.

If it sees the test, it fixes it, does a whole bunch of things in the background, um, and then kind of gets to a place, uh, where it returns, uh, a suggested user and what that user should be. And then I can go ahead and say, "Test this with more input and output," right?

So test this with these kinds of issues, like, "Oh, the database is down," or "Data is down," or whatever users say, right? "Pods are not scaling down." This whenever traffic, whenever the traffic spike ends, uh, whatever kind of situation you have, right? And, and what this then does is, uh, it runs a bunch more tests, does that, and as soon as, uh, you're kind of satisfied with the result of that test, um, or not, uh, you decide to continue.

In this case, I'm not satisfied because I looked at Tom, and this is again the kind of stuff I'm not, I don't want to look at code, right? But I look at Tom, I'm like, "Tom doesn't sound like the right guy. Show me all the users and their emails." And then I go in and see, uh, "Oh, Tom is from a different company.

Uh, Tom is from a different company, so I, I can't, I can't assign this to Tom, even though Tom is on our GitHub and is an external contributor," right? And so then I kind of add in another rule saying, "Hey, uh, remove this somebody else from an external company," right?

And I keep modifying this automation, keep testing it, whatever. As soon as that is done, as soon as that is done, I hit, uh, deploy. Uh, I hit, uh, deploy button here. That's all I do as a non-technical person. Um, this is our stock UI, the UI that, uh, is, the UI is there for those field, uh, for the field folks, the risk admins, uh, is more specialized for them.

Um, and then they go and deploy this code, and, uh, in the live demo that I would have showed you, I would have showed you that if you, uh, have these kinds of, lots of different issues that I was doing, uh, things like, uh, "Oh, data pipelines are corrupting data, uh, users get assigned," uh, and then when I say nonsense things like, "Give me a new feature real fast, real cheap," uh, I want to start assigning, I want to have a different, more complicated business rule, right?

So I want to go in and say, "Hmm, in case," let me just skip to the end and say, uh, "In case there is a generic request, uh, just change that to assign that to a default person," right? So I do this exclusion rule, uh, and after I check the exclusion rule, I say, "In case no relevant files are found, assigned to this person," right?

So I can keep adding these more and more business, I can add more business rules, I can test, and I can deploy, and the whole system kind of works, because for me as a business user, there's really no difference in working with data and shipping business logic on data, because if I can be confident that the business logic is working here, all I need is a guarantee that this works, uh, beyond, right?

So, um, that's kind of roughly, uh, that's kind of roughly what this, what this looks like. Um, from a security point of view, uh, the important thing is that the data layer is the part which keeps it real, so you can have as much vibe coding as you want on the layers above that, so that there's no multi-tenant authorization rules, etc.

So this AcmeQL plan is running strictly in user space, it's not actually running, uh, in data space. Um, the impact that we do this with procedure code selection and appointment selection, and that has a hundred million dollar impact plus upwards for them that they're going to realize, uh, over the course of this year that they've already started to.

Um, I just wanted to end on this particular note, which is that this is kind of where I I believe, uh, we're heading to, uh, which is that instead of developers building software, I think, uh, I think we need to start building the vibe coding platforms that are unique to our organization, um, and that's kind of what will help.

Um, we're at a booth here in PromptQL, so please do check us out and check out how we do the learning stuff, uh, but that's my time. Thank you so much, folks. Thank you.

AI Automation that actually works: $100M, messy data, zero surprises - Tanmai Gopal, Hasura/PromptQL

Chapters

Transcript