back to index

Advanced Guardrails for AI Agents | Full Tutorial


Chapters

0:0 Why Guardrails
0:20 Guardrails for Agents
3:40 Sparse and Dense Vectors
8:0 Hybrid Guardrails with Python
12:29 Initializing the HybridRouter
15:1 Optimizing our Hybrid Guardrails
19:16 Testing our Hybrid Router
20:46 AI Guardrails in Context

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, we are going to be taking a look at how we can build, to be honest, pretty spectacular guardrails for AI agents, chat applications, or just anything conversational,
00:00:12.200 | or anything where we need to classify incoming natural language queries from a user, or even from an LLM itself.
00:00:20.120 | When we're building these guardrails, there are a few different components that we would usually put together.
00:00:26.780 | We wouldn't just have one. We would typically have a routing layer, and that may even be broken down into multiple routing layers, or even a hierarchy of those.
00:00:38.080 | That routing layer may be our guardrails, or it could be the opposite. It could be defining the scope of what we can talk about, or maybe even both.
00:00:51.060 | So we could have where we're allowed to talk about over here in yellow, and where we're not allowed to talk about over here in red,
00:00:58.260 | and maybe we even overlap a little bit for very specific topics.
00:01:02.340 | Now, this fits in as, for example, a user over here will be chatting, and their queries will be hitting our initial set of guardrails,
00:01:14.800 | and saying, "Okay, you can come through, or you cannot."
00:01:17.800 | Now, depending on whether we get a "okay, you can continue," or we hit a guardrail, we may do different things.
00:01:27.600 | We may, with a guardrail, we may decide, "Okay, we're just going to provide a pre-written response,"
00:01:33.680 | or we might say, "Okay, we need to go to another LLM and tell it it needs to deal with this situation in a good way."
00:01:42.120 | So that will be the first layer.
00:01:44.920 | Let's say we do get through our initial set of guardrails.
00:01:48.760 | Then what we might want to do is have our LLM down here.
00:01:54.560 | But of course, we're not just going to rely solely on those guardrails.
00:02:00.480 | We also want to be prompting our LLM to be as secure as possible as well.
00:02:05.280 | So there's another layer of protection also here.
00:02:08.540 | And that LLM, if a guardrail is hit, might decide, "Okay, we don't respond directly, or we provide a specific response that goes back to the user."
00:02:19.040 | Okay?
00:02:20.800 | Let's say that did get through.
00:02:22.320 | Then what we might do is we come back around to the user and we hit either the same guardrails or, more likely, a different set of guardrails
00:02:34.560 | to make sure that the LLM isn't responding to something that we don't particularly like.
00:02:40.480 | Now, this structure here is what I would say is the essentials of guardrail protection for a public chat agent.
00:02:55.280 | There are still more things that we could add here.
00:02:58.240 | We could add things like specific codified rules that look for specific terms.
00:03:04.480 | Or we can have different classification models that also look at the incoming and outgoing text.
00:03:11.360 | But with what we have here, we can actually get pretty far.
00:03:16.320 | Now, what we are focusing on is this layer here.
00:03:21.440 | This is our semantic routing layer.
00:03:25.040 | And I've spoken a lot about the semantic router before.
00:03:28.320 | But what I want to talk about in this video is making semantic router better by not just relying on semantics,
00:03:36.480 | but also involving term matching.
00:03:40.480 | Now, what does that actually mean?
00:03:41.680 | So, let's start with the semantic router.
00:03:45.680 | Our user will say something.
00:03:48.160 | And what we do is we process a query through an embedding model such as OpenAeyes takes embedding 3.
00:03:55.040 | And what that does is it creates this vector representation of their query within a semantic space.
00:04:02.720 | And what that means is that similar queries, when embedded with the same embedding model,
00:04:10.080 | would appear in a similar space to other queries that have similar semantics, i.e. they have a similar human meaning.
00:04:20.720 | So, for example, one of the queries that we're going to be using later is "Can I sell my Tesla?"
00:04:28.800 | Right, let's say that this is "Can I sell my Tesla?" in vector space.
00:04:33.280 | Okay, so we can imagine this as a 2D area.
00:04:37.200 | Let's say this one is "Can I sell my Tesla?"
00:04:40.480 | Then what we would find is, okay, over here we might have another query, which is "Can I sell my Pulsar?"
00:04:48.800 | Okay, which is another EV.
00:04:50.320 | And the human meaning of both of those queries is pretty similar.
00:04:54.400 | So they end up in a similar vector space.
00:04:56.480 | And that means that for our user query here, what they might be saying is "Can I sell my some other car?"
00:05:04.560 | Okay, and what the semantic router by itself is doing is looking at a user's query and it's saying
00:05:12.560 | "How similar is that user's query to a set of other queries that we have predefined?"
00:05:19.840 | Okay, so we could say here, okay, these two red queries,
00:05:22.880 | they are very similar and they are above a certain similarity threshold to our user's query.
00:05:29.440 | So we then know to flag the user's query as a particular route or a particular guardrail
00:05:36.480 | based on those predefined queries.
00:05:38.800 | Now this works pretty well, but there are some issues with it.
00:05:45.280 | So for example, if our use case is we are a chatbot on a EVMaker's website.
00:05:52.320 | Let's say our chatbot is from BYD.
00:05:56.800 | And we only want our users to be able to talk about BYD.
00:06:00.640 | We don't want them to talk about Pulsar.
00:06:02.080 | We don't want them to talk about Tesla.
00:06:04.080 | Now, the problem with semantics here is that the semantic meaning is general.
00:06:11.600 | It's not the semantic meaning based on whether this is BYD or Tesla.
00:06:16.320 | It's the semantic meaning of general language.
00:06:19.440 | So what you will find is if someone's asking about BYD,
00:06:23.360 | the semantic similarity between that and someone asking about Tesla is going to be pretty similar.
00:06:28.480 | So this is where semantic routing alone can fall down.
00:06:33.040 | However, before we had dense embedding models, which create these semantic vectors,
00:06:39.440 | we had the more traditional embedding models, which are things like BM25 or TF-IDF.
00:06:47.920 | And the way that these work is that they actually look at the words or terms between two sentences.
00:06:55.680 | And where we see that there is a lot of term overlap, we would score them more highly.
00:07:02.720 | And with that, we can create what we would call sparse embeddings.
00:07:06.720 | And these sparse embeddings base their similarity scores on how much term overlap there is between
00:07:13.760 | two queries rather than the semantic similarity between them.
00:07:18.080 | So this is a really good use case for a scenario where we are a brand.
00:07:23.360 | And we want to say queries that include information about our brand are okay, we can talk about them.
00:07:30.000 | But queries about other brands, especially our competitors, we do not want to answer to them.
00:07:35.120 | But at the same time, we still do want to include that semantic similarity.
00:07:39.680 | Just because someone says something about BYD or Tesla, it doesn't mean that it would be Inscope or our
00:07:45.760 | particular chatbot, which is something that semantic vectors can capture better.
00:07:51.520 | So we can merge these two methods and use a hybrid approach to create our guardrails.
00:07:59.760 | Now we're going to be working through this example here.
00:08:02.640 | It's in the semantic router docs.
00:08:04.240 | And I'm going to go ahead and just open this in colab.
00:08:07.440 | And we can see the initial setup.
00:08:09.280 | And we're going to start by just installing the semantic router library.
00:08:12.960 | And then what we're going to do is create a set of routes.
00:08:18.160 | So these are the routes that we would like to allow.
00:08:21.840 | Okay, so in this case, we are BYD and we want a customer facing chatbot that allows users to ask
00:08:28.960 | about our products.
00:08:30.240 | But then we want to block our competitors products from being spoken about.
00:08:37.200 | So in this case, we are looking at Tesla, Pulsar, and Rivian.
00:08:42.720 | Now, all of the utterances that you see here are just example queries.
00:08:46.800 | So these are things that we could imagine users asking.
00:08:50.640 | And what we are essentially doing here is populating that semantic space,
00:08:55.280 | but also the sparse vector space.
00:08:57.840 | So we'll see that queries that mention BYD will share a high similarity to this
00:09:06.960 | route, whereas queries that include Tesla will share high similarity to this other route.
00:09:12.560 | So let's go ahead and run those.
00:09:17.680 | We gather all those routes within a single routes variable here.
00:09:23.760 | Now we will need a OpenAI API key for this part here.
00:09:27.760 | And we can get that from platformopenai.com/apikeys.
00:09:33.040 | And we should find ourselves on this page.
00:09:36.080 | So we're going to go ahead and create a new key.
00:09:38.800 | I'm just going to call this one the hybrid chat demo.
00:09:42.080 | Of course, you call it whatever you like.
00:09:43.600 | Whatever is useful to you.
00:09:46.320 | And we're just going to run this cell and enter our API key there.
00:09:51.680 | And what I just want to show you is I just want to show you some examples of similarity
00:09:57.200 | or semantic only similarity between various queries.
00:10:01.760 | Okay.
00:10:02.000 | So in this one, we have can I start my Tesla, Pulsar, BYD, and Rivian.
00:10:06.000 | And you can see the similarity between these is like 0.65, 67, 69.
00:10:12.800 | Okay.
00:10:13.280 | So fairly high similarities for this embedding model.
00:10:17.120 | Now, what about if we just talk about BYD?
00:10:22.400 | All right.
00:10:23.040 | So we're asking various queries about BYD.
00:10:25.840 | Let's just see how similar those are.
00:10:28.160 | Okay.
00:10:30.400 | And you can actually see overall the similarities between these queries here,
00:10:35.440 | which are all talking about BYD, are actually score lower than the similarity scores that we have
00:10:42.080 | from asking how to sell the various EVs.
00:10:45.840 | So this is a perfect scenario where we can't just rely on semantic embeddings.
00:10:52.640 | And instead, we might want to bring in the hybrid approach.
00:10:57.040 | So to do that, we're going to be using a sparse encoder.
00:11:02.400 | And this will actually be using BM25, but it's a BM25 model that is compatible with VectorSearch.
00:11:10.080 | Not all of them are.
00:11:11.040 | And it has also been trained on a large internet scale dataset.
00:11:16.160 | Now, what that does is essentially proxies using BM25 on your own dataset.
00:11:22.800 | Because in many cases, you might want to later train your own BM25 model on your own data.
00:11:29.360 | But you can actually use this and get pretty solid results out of the box.
00:11:34.240 | So for this, we will need another API key, which we get from platform.aurelio.ai.
00:11:42.080 | You would need to create an account.
00:11:44.400 | Now, assuming that you haven't already used the Aurelio platform, you can get some free credits to follow along with me on this video.
00:11:52.400 | And to do that, we'd go to billing, add credits, put $5 for USD here.
00:12:00.640 | Go through to purchase, and then you just add your promotion code here.
00:12:07.840 | And that will give you $5 of free credits.
00:12:10.960 | Once you have your credits, you can then go to settings, API keys, and create a new API key.
00:12:18.880 | I'm going to, again, just call this hybrid chat demo.
00:12:23.120 | I'm just going to copy that and use it here.
00:12:28.160 | Okay, great.
00:12:28.880 | So we have that, and we can now move on to setting up our hybrid router.
00:12:32.960 | So we're using from semantic router routers.
00:12:36.080 | And rather than using the semantic router, which I'm sure probably a lot of you have used, we would be using the hybrid router.
00:12:43.920 | And the only real difference in terms of using the hybrid router is that we need to include this sparse encoder, which is, of course, is what we have initialized up here.
00:12:54.720 | So we initialize our sparse encoder, pass it through to our hybrid router, and then we're actually good to go.
00:13:00.800 | Now, one unique thing with hybrid is that it's much harder to define what a good similarity threshold is.
00:13:10.720 | Because your similarity threshold is a merger between your dense semantic vector space and your sparse term matching vector space.
00:13:20.960 | So we provide these default thresholds here, and these are mostly fine, depending on your use case.
00:13:30.640 | So in this use case, if we're just blocking mentions of these, but allowing mentions of these, and we don't really care about any other queries,
00:13:38.800 | this would actually probably be completely fine.
00:13:41.200 | But in the scenario where you have other conversations that you would allow without including them within our predefined set of utterances,
00:13:54.160 | this will not work as well.
00:13:57.600 | So let me go through how we can optimize our routes and those specifically those route thresholds to support more use cases.
00:14:09.040 | So, okay, just to start with, if we are looking at very specific queries, okay, about Rivian, Pulsar, Tesla, BYD, we can actually get pretty good performance.
00:14:19.280 | Okay, so we can test that by creating a set of test data.
00:14:23.200 | Okay, we have our utterance on the left and the target route on the right.
00:14:31.440 | So we create a list of tuples that looks like that.
00:14:34.160 | From that, we get our list of X, which is the queries.
00:14:40.000 | So these items here.
00:14:41.600 | And then we have our targets, a Y, which is BYD, Tesla, and so on.
00:14:46.880 | Okay, and we evaluate our performance of our router on this test data set, which is a very small test data set.
00:14:54.240 | But nonetheless, you can see, okay, yeah, we get 100% accuracy.
00:14:57.280 | That's not too surprising, given we don't have many examples here.
00:15:01.040 | However, if we make that a little more complicated, so we add many more examples, so many more utterances here, they are a lot more complex.
00:15:13.040 | So for example, BYD, we're including, you know, mentions about Tesla and other EVs.
00:15:18.320 | And we're just including far more utterance to route pairs here than we did before.
00:15:26.640 | So that obviously increases the scope of what we are testing, which is important.
00:15:30.960 | You should always have as big as possible a test data set.
00:15:35.600 | So we have all these, and then we have also added these.
00:15:39.760 | So this is if we would like to allow our users to talk about other things, which in, honestly, in a lot of cases, we might actually not want to do that, but it makes it harder.
00:15:52.640 | So I do want to, in this example, show you how we do handle that.
00:15:57.120 | So if we provide a non-route within our test data, the hybrid router is going to perceive that as, okay, this should not trigger any routes.
00:16:09.520 | And it should pass through the router without any issues.
00:16:12.960 | Okay, so we just have a lot of generic queries here.
00:16:15.520 | Like, how do I start a vegetable garden?
00:16:17.920 | What was the best way to cook a steak?
00:16:19.760 | Who's the first person to walk on the moon?
00:16:21.680 | Like things that are nothing to do with our earlier queries.
00:16:25.120 | And it's also very important to include a lot of these.
00:16:29.520 | If you are aiming to build that sort of general purpose chat bot that will not be blocked on everything.
00:16:36.960 | Okay.
00:16:37.600 | So we run that and we would pass that through to our evaluate method again.
00:16:44.240 | Before we do that, this is what X and Y look like.
00:16:47.680 | So what I mentioned before, X is our utterances.
00:16:50.800 | So the input data and Y includes our target routes.
00:16:55.840 | Okay.
00:16:56.240 | You can go all the way to the unknown.
00:16:57.440 | We should see some of those nones as well.
00:17:00.480 | I'm not sure we need to do that.
00:17:02.160 | But if you like, you can set your thresholds here via the set threshold.
00:17:07.600 | Also, if you wanted to, you can set threshold for all of your routes like this.
00:17:11.920 | But I don't think we necessarily need to do that, to be honest.
00:17:17.120 | So let me even show you.
00:17:18.720 | So we'll just stick with those default route thresholds and let's see what the accuracy is.
00:17:24.240 | Okay, pretty shocking to be honest, right?
00:17:29.360 | That's not great.
00:17:30.400 | Now, a big part of that is because these route thresholds here are so low that essentially all
00:17:36.720 | those non-test cases are going to be hitting like BYD or Tesla or Pulsar no matter what,
00:17:43.120 | even though they're completely unrelated.
00:17:45.280 | Because this is an incredibly low similarity score.
00:17:49.440 | So anything with a high similarity than that is going to be triggering one of those routes.
00:17:54.000 | Which is fine if you're creating guardrails for everything.
00:17:57.840 | But it's not fine if you want to allow some pass through of unknown queries.
00:18:03.040 | So what we need to do is call the fit method.
00:18:06.400 | And what this will do is based on our test data up here, it's going to say,
00:18:10.880 | okay, I'm going to find the optimal thresholds given your test data set.
00:18:16.880 | So we run that.
00:18:18.640 | And we can see already, so this actually runs for probably far longer than it needs to.
00:18:25.360 | But you can see already that the accuracy here, it was a little lower at the start.
00:18:30.880 | It's increased and now it's at 94%.
00:18:32.720 | Now let's leave this to continue.
00:18:35.920 | We see it's increased up to 95, 96% now.
00:18:40.000 | We could possibly even leave it for longer, but that is pretty good.
00:18:44.000 | Okay, so now let's see what our new thresholds are.
00:18:46.800 | So we have 0.57, 4.4 for Tesla.
00:18:52.320 | Pulsar, 3.4.
00:18:55.120 | And for Rivian, 71.
00:18:56.960 | Interesting.
00:18:57.920 | Okay, so these are thresholds that the hybrid router has identified as being
00:19:03.600 | the best performing thresholds for our test data set.
00:19:07.600 | And the accuracy, whereas before we had 51%, has now increased up to 96%, which is a huge improvement.
00:19:15.680 | And we can also evaluate that again.
00:19:17.760 | So it's 95.61% now, which, yeah, great improvement.
00:19:23.280 | And it was not really that hard to do.
00:19:26.560 | Now, that is with a relatively small data set here.
00:19:31.280 | This is not huge.
00:19:32.160 | As I said, the bigger, the better.
00:19:36.000 | But what you can also do, which I like to do anyway, is you can go to like ChatGPT or some
00:19:42.800 | other LLM and just ask it.
00:19:45.280 | Okay, these are my scenarios I want to cover.
00:19:48.080 | These are scenarios I want to let through.
00:19:49.920 | Can you create a set, a big training data set for me?
00:19:52.960 | And it's actually pretty good at doing that.
00:19:55.360 | So I wouldn't rely solely on that.
00:19:58.320 | But you can at least fill up that data set quite easily by doing so.
00:20:05.360 | So with that, we can ask some other queries.
00:20:08.800 | Like, okay, can I buy a Tesla from you?
00:20:12.720 | Okay, I don't think that was in the train set, but I'm not sure.
00:20:16.480 | And we see, okay, route choice is Tesla.
00:20:19.920 | Okay, that's great.
00:20:21.120 | Now I can say, okay, something random.
00:20:24.720 | How much is a flight to Australia from Europe?
00:20:33.680 | Very generic, but fine.
00:20:36.640 | And we see that our route choice is none, so no guardrails were triggered.
00:20:42.320 | Okay, so that looks to be working pretty well.
00:20:45.680 | Now, that is actually it for this video.
00:20:48.400 | This is, as I mentioned at the start, this is one component of what I think should be,
00:20:53.840 | you should have multiple layers of guardrails,
00:20:55.680 | especially for anything that you're putting in production.
00:20:58.080 | That should be, as we've seen here,
00:21:00.240 | essentially the input guardrails using hybrid routes.
00:21:04.240 | But you should also, of course, cover, okay, what's coming out from your LLM.
00:21:08.560 | You should be covering your LLM prompting guardrails
00:21:12.960 | and putting in various other safety components to ensure that, okay,
00:21:18.720 | people aren't going to be misusing whatever it is that you're building.
00:21:22.960 | But as you can see, you can get pretty far already with these hybrid routers.
00:21:29.440 | And they're insanely, just insanely fast and also insanely cost effective compared to,
00:21:36.000 | let's say, just using their LLM as your guardrails,
00:21:38.880 | whilst also providing that extra layer of safety.
00:21:43.360 | Now, that is it for this video.
00:21:46.640 | I hope all of this has been useful and interesting.
00:21:49.360 | But for now, I'll leave there.
00:21:50.560 | So, thank you very much for watching.
00:21:52.800 | And I will see you again in the next one.