Trust, but Verify: Shreya Rajpal

00:00:00.600 | Shreya Rajpal: Hey everyone, thank you for coming.

00:00:16.080 | I am Shreya Rajpal, I am one of the co-founders and the CEO of Godrails.ai.

00:00:22.020 | And today we are going to be talking about trust but verify, which is a new programming

00:00:27.280 | paradigms that we need as we're entering Gen.ai native application development.

00:00:35.940 | Before we get started, a little bit about me.

00:00:39.120 | As I mentioned, I'm currently at Godrails.ai.

00:00:41.580 | In the past, I've spent about a decade or so working in machine learning.

00:00:45.120 | Previously, I was the machine learning infrastructure lead at Predibase, which is a machine learning

00:00:51.940 | infrastructure company.

00:00:53.360 | I spent a number of years in the self-driving car space, working across the stack of self-driving,

00:00:59.020 | and before that did research in classical AI and deep learning.

00:01:02.020 | Shreya Rajpal: Awesome, so we're seeing this massive explosion in AI applications over the last

00:01:11.680 | year.

00:01:12.680 | There's a lot of excitement.

00:01:13.680 | And that's also why so many of you guys are here attending this.

00:01:17.180 | We have folks from Auto GPT, which really took the world by storm and opened up the possibility,

00:01:22.680 | and all of our minds with what AI can do.

00:01:25.340 | We've seen a lot of really awesome applications in mental illness, sales, even software engineering.

00:01:35.340 | This is a relevant graph.

00:01:36.340 | This is basically search interest for artificial intelligence over time.

00:01:41.400 | You can really see that peak around where ChatGPT came out.

00:01:48.060 | But if you think about where a lot of the reality is, or a lot of where the value lies today, even

00:01:54.060 | though generative AI applications have seen fastest adoption compared to a lot of these other consumer

00:02:00.060 | applications, their retention right now tends to be lower.

00:02:05.660 | So these are some graphs I borrowed from this really fantastic article by Sequoia.

00:02:11.720 | And you can really see that retention for AI-first companies versus the one-month retention for non-AI-first

00:02:19.100 | traditional software companies.

00:02:22.100 | So why is this the case?

00:02:24.900 | A common symptom that a lot of people experience as they're working with generative AI

00:02:30.060 | applications is, my app worked while prototyping, but it failed the moment I tried shipping

00:02:36.160 | it out.

00:02:37.160 | Or even the moment someone else tried testing this, it just behaved very unreliably.

00:02:41.880 | But the root cause of this symptom is that machine learning is fundamentally non-deterministic.

00:02:51.740 | For those of you, we're going to dig deeper into what that really means.

00:02:56.000 | So I'm guessing that a lot of you here have worked with traditional software systems before.

00:03:01.100 | So if you think about a database and querying a database to get a question about how much

00:03:06.320 | was the spend of X user over the last month.

00:03:10.300 | Every single time you hit that database API, you are going to get what is the correct response.

00:03:17.100 | Right?

00:03:18.100 | And correct really means representative of whatever your true data actually is.

00:03:23.200 | So this is completely irrespective of uptime and availability, et cetera.

00:03:30.560 | This fundamental property allows you to really build these really complex software systems which

00:03:37.200 | are in the same way for our world today.

00:03:39.720 | But if you think about machine learning model APIs, this is not really the case.

00:03:45.140 | Because if you know fundamental stochasticity that is part of machine learning systems, for

00:03:51.460 | a lot of you that have worked with generative AI systems and LLMs in the past, you'll see that

00:03:56.460 | even if you ask the same question across multiple times in a row, you're going to end up seeing

00:04:01.860 | different responses.

00:04:04.780 | And because of this, being able to build these really complex systems that talk to each other,

00:04:12.180 | that rely on previous outputs, et cetera, becomes harder.

00:04:15.760 | Because you have this issue of compounding errors that really kind of explodes.

00:04:22.640 | This is just diving deeper into the problem a little bit.

00:04:25.500 | A lot of common issues as you work with these problems, hallucinations.

00:04:30.240 | That's a very buzzwordy thing that a lot of us here are familiar with.

00:04:33.660 | But there's a lot of other issues like correct structure, their vulnerability to prompt injections.

00:04:41.180 | And all of this is exacerbated by the fact that unlike all other previous generations of programming,

00:04:48.220 | the only tool that is really available to you is English.

00:04:52.040 | It's just the prompt that you can really work with.

00:04:55.600 | So we end up in the scenario right now and in the current time that we're in where use of LLMs is limited wherever correctness is really critical.

00:05:06.540 | I love GitHub Copilot, it's on my badge as my favorite tool, but if GitHub Copilot is wrong, you just ignore it and move on.

00:05:15.480 | Same as ChatGPT, the chat interface is really, really great because it's iterative, and you can give it feedback, and if it's incorrect, you can tell it why it's incorrect, and it can maybe give you something that's more appropriate.

00:05:27.540 | But this is not the use case for a lot of really high value critical applications.

00:05:36.480 | And so how do we add correctness guarantees to LLMs while still retaining their flexible nature that really allows them to adapt so well to so many tasks?

00:05:46.540 | So I'm going to add this quick quote here by Alex Gravely, who is the creator of GitHub Copilot.

00:05:56.540 | It's a very simple idea, which is that add a constraint checker to check for valid generation.

00:06:03.540 | On violation, inject what was generated and the rule violation and regenerate.

00:06:09.480 | So once again, we're trying to think about how programming paradigms change as we're working with this fundamentally non-deterministic technology.

00:06:17.480 | So this is something that wasn't needed for the longest time because we're working with deterministic systems, but becomes very relevant now.

00:06:25.480 | So interestingly, this tweet was actually pretty recent.

00:06:29.480 | But Guardrails AI, the open source framework that implements this and kind of like builds a framework around this strategy, has existed for a little while longer from the beginning of this year.

00:06:43.480 | So Guardrails acts as a safety firewall around your LLMs, and this kind of fundamentally introduces a novel paradigm that, once again, wasn't as necessary in the previous generations of software development.

00:06:58.480 | So this is what a lot of the software development architectures for applications that you might build may look like, where you have some application, and then in that application you have a prompt that gets sent to an LLM, and then you end up getting some output or some response back.

00:07:17.480 | So this is the new paradigm that we propose, and that Guardrails kind of implements as a framework, wherein every output that you get back passes through a verification suite.

00:07:29.480 | And that verification suite looks at all of the functional areas of, you know, inconsistencies or risks that you are really sensitive to as an application builder, which may be very, very different from, you know, if you're building a code generation application, whereas if you're building, like, a healthcare chatbot.

00:07:46.480 | So maybe, like, containing PII or PHI, like, sensitive information might be something you want to check against, or profanity filtering that out.

00:07:56.480 | If you're building a commercial application, you might really care about the fact that there's no mention of any competitors, like, if you're building a McDonald's chatbot, like, nobody should be able to get your chatbot to say that Burger King's the best burger in town.

00:08:09.480 | So, making sure that any code that you generate is executable within your environment, as well as, you know, summarization or free-form text generation is true and grounded in the source that you know to be, you know, correct, and not just hallucinated from the model.

00:08:25.480 | So, each of these ends up being an independent check that runs as part of this, like, comprehensive verification suite that allows you to build trust in the models and the ML applications that you're building.

00:08:38.480 | So, the paradigm that we propose is that only use large language model outputs.

00:08:44.480 | If your verification suite passes on failure, you can really hook into this very powerful capability that LLMs unleash, which is, you know, their ability to, like, self-heal, which is that if you tell them why they're wrong, they can often correct themselves.

00:08:59.480 | And you can kind of go through this loop again if you have the, you know, latency budget or even the dollar budget or the token budget to implement this.

00:09:09.480 | I'm going to, like, go over this very briefly, but under the hood how Guardrails does this is that it allows you to create what we call guards from, you know, different inputs.

00:09:20.480 | So, you can use, like, either a declarative model spec, such as, like, you know, like XML or rail.

00:09:27.480 | You can use pydantic models that implement, like, specific validation criteria and structure.

00:09:33.480 | Or you can use string implementation.

00:09:35.480 | You can create a guard from all of these components.

00:09:38.480 | If you want, you can add information about, you know, your prompt as well as the LLMs you want to use.

00:09:43.480 | And then you create this at initialization.

00:09:47.480 | But at runtime, this guard will basically surround your LLM callable and then make sure that everything that you're sending in or getting out of the LLM is valid and correct for you.

00:09:58.480 | Right?

00:09:59.480 | So, for example, if your output is valid, you end up sending the output back to your application.

00:10:04.480 | But if it's invalid, you go through this loop of looking at which constraint is violated or which check is violated.

00:10:13.480 | And then if on violation, you have a set of these policies including, like, re-asking, which we touched on earlier.

00:10:20.480 | Filtering or fixing, which is programmatically trying to correct output.

00:10:25.480 | Falling back on some other system.

00:10:28.480 | So refraining from answering.

00:10:30.480 | Or, you know, just no-op where you don't actively take an action, but you log and store what the outputs of those checks or verification was and, like, why that particular check failed.

00:10:42.480 | And then you only do this, like, on -- you only return the output once you know you can trust whatever came out of the LLM.

00:10:49.480 | So within this framework, what Guardrails.ai does is it's a fully open source library that allows you to, A, create custom validators.

00:11:01.480 | It orchestrates the whole validation and verification process for you to make sure that, you know, you're not taking on this, like, really kind of, like, often latency-intensive task of doing validation and make sure that it's done as efficiently as possible.

00:11:16.480 | It's a library and a catalog of many, many commonly used validators across a bunch of use cases.

00:11:23.480 | And it's a specification language that allows you to compile your requirements into a prompt.

00:11:29.480 | So that, like, whatever specific validators you want to use are automatically turned into a prompt so that you know that, you know, those requirements are also being communicated to the LLM.

00:11:43.480 | All right, so a common question, why do I need this?

00:11:46.480 | Why can't I just use prompt engineering or, you know, a better fine-tuned model?

00:11:50.480 | So, okay, so for some reason my rendering here is weird.

00:11:57.480 | But controlling the outputs with prompts, including using retrieval augmented generation, which basically injects specific context into your prompt, doesn't act as a guarantee, right?

00:12:09.480 | LLMs are stochastic.

00:12:11.480 | Even if you do all the prompt engineering in the world, there's nothing guaranteeing that those instructions will be followed.

00:12:18.480 | We actually did this as an experiment for an unrelated thing where we used LLMs as evaluators.

00:12:24.480 | We ran the exact same experiment five different times, changing, like, absolutely zero parameters with zero temperature and saw, like, different numbers across our benchmark, which is, you know, really fascinating and wouldn't really fly in, like, previous generations of machine learning.

00:12:39.480 | And then second, prompts don't offer any guarantees.

00:12:44.480 | LLMs don't, you know, always follow instructions.

00:12:48.480 | The alternative is also, like, controlling the outputs with models.

00:12:52.480 | So, first of all, it is very expensive and time-consuming to train a model.

00:12:57.480 | In my past life, this was basically what I've done my whole life.

00:13:01.480 | And I was so frustrated with this whole process as I joined a startup where my job was to make this, you know, this process easier, like, as a function.

00:13:09.480 | But it still requires, like, you know, compiling a lot of data set, which is expensive, training a model over a bunch of hyperparameters, and then serving it.

00:13:19.480 | And then if you aren't doing that and you're using, like, an LLM that's hidden behind a commercial API, you typically don't have any control over model version updates.

00:13:30.480 | So I've kind of seen this where, you know, I mentioned, like, validations get compiled into prompts.

00:13:36.480 | So I've kind of, like, observed where commercial models will get updated under the hood.

00:13:41.480 | And so prompts that might have worked for you in the past will stop working just over time.

00:13:46.480 | So how do these guardrails work under the hood, right?

00:13:52.480 | There's no, like, one-stop-shop solution for a guardrail here.

00:13:58.480 | It really depends on the type of problem that you're solving.

00:14:02.480 | So a very reliable way, if possible, for implementing a guardrail is to ground it in an external system.

00:14:10.480 | So let's say you're working in a code generation app.

00:14:13.480 | A really good way to generate more reliable code is to actually hook up the output of the LLM into a runtime that basically contains application-specific data.

00:14:23.480 | So we tried it for a lot of text-to-SQL applications, which is something that is supported as a first-class citizen in guardrails.

00:14:31.480 | And we found that this re-asking framework, where you hook it up to, you know, a sandbox that contains your database and your schema,

00:14:39.480 | really substantially improved the correctness of the SQL queries that you got.

00:14:44.480 | You can also use rule-based heuristics.

00:14:48.480 | So really looking into, like, OK, if I'm, let's say, trying to extract an interest rate from a really long document,

00:14:55.480 | I always must know that interest rates, you know, end with, like, percentage signs.

00:14:59.480 | And so that can be a clue that I must always be retrieving.

00:15:02.480 | You can try to use, like, traditional machine learning methods or high-precision deep learning classifiers.

00:15:08.480 | So really you don't need the full power of an LLM to solve, you know, really basic constraints.

00:15:14.480 | So trying to find, like, is there some type of toxicity in this output?

00:15:19.480 | Does some type of output contain, you know, advice that is harmful for my users or is misleading my users in some way?

00:15:26.480 | You don't need-- my favorite analogy to use is you don't need, like, a jackhammer to crack open a walnut.

00:15:32.480 | So if possible, you know, some of the guardrails should use, like, smaller classifiers that are much more reliable and deterministic instead of, you know, using LLMs.

00:15:42.480 | And then finally you can also use LLM self-reflection.

00:15:47.480 | All right.

00:15:48.480 | So we're going to walk through this example of how this works in practice for building a chatbot where you want to generate correct responses always.

00:15:58.480 | So let's say you're an organization that has certain help center articles and you want to make sure that you always generate-- you know, your users can ask questions over those help center articles in a chatbot.

00:16:11.480 | And you always generate, like, correct responses where correctness means no hallucinations, not using any foul language, so don't swear at your customers, and never mention any competitors.

00:16:26.480 | Now, how do you really prevent hallucinations?

00:16:29.480 | Like, that's a very fundamental question, right?

00:16:32.480 | Providence guardrails.

00:16:35.480 | Providence guardrails essentially mean that every LLM utterance should have some leaning in a source of truth, right?

00:16:44.480 | Especially if you're building, like, retrieval augmented generation applications.

00:16:48.480 | You make the assumption that, okay, I gave it this context.

00:16:51.480 | I hope it's using the context.

00:16:53.480 | What you want to make sure is that every output that is generated, you're able to pinpoint to where in the context, you know, your response kind of came from.

00:17:01.480 | So this is one of the guardrails that, you know, exists in our catalog of guardrails.

00:17:06.480 | Under the hood, there's a few different techniques that we employ.

00:17:09.480 | We use embedding similarity.

00:17:11.480 | We also have, like, classifiers that are built on traditional NLI, like, natural language inference models.

00:17:17.480 | And we use LLM self-reflection.

00:17:20.480 | This is a very brief, you know, snippet of, like, how to configure a guard, where you can essentially, like, select from this catalog which guardrails you want to use.

00:17:32.480 | So we've used provenance, profanity, no references to peer or competitor institutions.

00:17:38.480 | And then you essentially wrap your LLM call with, you know, the guard that you've created.

00:17:45.480 | So very briefly, let's say you get some question which is, like, how do I change my password on your application?

00:17:51.480 | You have, like, some prompt that, you know, is constructed from your retrieval augmented generation application.

00:17:57.480 | But because LLMs are very, very prone to hallucinating, there's, like, it hallucinates where the setting exists for you in your, you know, in the response.

00:18:07.480 | When this passes through your verification suite, the provenance guardrail will essentially spike and will cause the LLM to, you know, like, go through this, like, re-asking loop.

00:18:17.480 | Where a re-ask prompt will automatically be constructed for you via guardrails, which will, like, pinpoint which part is hallucinated, give it the context again, and ask it to correct itself.

00:18:29.480 | And then finally, the re-ask output, you know, it tends to be more correct.

00:18:33.480 | And so we can kind of see here in this toy example that the output is, you know, corrected for you.

00:18:38.480 | And finally, verification passes, and you can send this back to the output.

00:18:43.480 | Very briefly, more examples of validators that you can create or that exist.

00:18:48.480 | Never giving any financial or healthcare advice.

00:18:50.480 | Making sure that any code that you generate is usable.

00:18:53.480 | Never asking any private questions from your customers or mentioning competitors.

00:18:57.480 | No profanity, prompt injection, et cetera.

00:19:01.480 | And then just to summarize what guardrails does for you, custom validations, orchestration of verification,

00:19:07.480 | a catalog of commonly used guardrails, as well as automatic prompt compilation from your verification checks.

00:19:15.480 | To follow along, you can look at the GitHub project, which is at Shreya r/guardrails.

00:19:20.480 | Our website with our documentation is guardrailsai.com.

00:19:24.480 | Or you can follow me or the project on Twitter.

00:19:27.480 | And that's for my LinkedIn.

00:19:29.480 | Awesome.

00:19:30.480 | Thank you so much, everyone.

00:19:32.480 | Thank you.

00:19:33.480 | Thank you.

00:19:34.480 | Thank you.

00:19:35.480 | Thank you.

00:19:36.480 | you

00:19:38.540 | you