back to index

Better Chatbots with Semantic Routes


Chapters

0:0 Semantic Router
0:37 Concept of Semantic Routers
7:42 Routes and Utterances
15:11 Encoders
16:26 New Routers
20:48 Semantic Routes for Chat
21:29 LLM Output Guardrails
28:25 Fine-grained control of LLMs
29:10 Routes for Tool Use
32:1 LLM Routing
34:34 Outro

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, we're going to be talking about how
00:00:02.120 | we can use this concept of semantic routing
00:00:05.840 | or semantic routes to broaden the number of things
00:00:11.120 | that we can do within the context of chatbots and AI
00:00:15.680 | agents, and also get a very fine level of control
00:00:23.880 | over LLMs and our sort of agentic workflows,
00:00:29.240 | and to be able to take that control much further
00:00:32.080 | than if we were not using this idea of semantic routes.
00:00:37.520 | So I want to start by just explaining
00:00:40.840 | what I mean when I'm talking about semantic routes.
00:00:43.800 | So the core concept behind semantic router
00:00:47.520 | in the context of chatbots and agents,
00:00:51.480 | like conversational AI essentially,
00:00:54.520 | a user is going to come in, and they're
00:00:57.000 | going to have some queried, and they're
00:00:59.080 | going to say something.
00:01:00.720 | So they have whatever they're writing,
00:01:04.120 | their question, whatever it is.
00:01:07.000 | They bring that in.
00:01:08.320 | And what we can do is we use the concept that
00:01:15.480 | comes from vector search and vector retrieval, which
00:01:19.560 | is we take this text.
00:01:22.640 | We put it through an embedding model.
00:01:24.840 | So like OpenAIs, Embed3, Cohere's embedding models,
00:01:30.320 | or among the many open source ones that we can also use.
00:01:35.520 | And OK, so we put it through this embedding model here.
00:01:40.000 | And what we get is a vector, or an embedding,
00:01:44.120 | or a vector embedding, whatever you want to call it.
00:01:47.680 | Basically, that vector embedding is a point
00:01:50.360 | in a high dimensional space.
00:01:54.120 | So for example, with OpenAI's Embed3 model,
00:01:57.880 | that high dimensional space is 1,536 dimensions.
00:02:03.360 | So imagine 3D, but a lot more.
00:02:06.480 | And therefore, you can't really imagine it.
00:02:09.160 | So it's basically, imagine 3D, but just
00:02:13.400 | know that it's not actually 3D, or 2D if you like as well.
00:02:17.240 | Whatever works.
00:02:18.200 | So OK, we have our query.
00:02:21.680 | We've turned that into this point on a 2D plane.
00:02:28.560 | Now, as we do with vector search,
00:02:32.720 | we're going to be looking at what
00:02:35.080 | this query is surrounded by.
00:02:38.040 | So in this scenario, let's say we've created
00:02:41.760 | some semantic routes already now.
00:02:43.800 | I'll explain a little bit on how we do that soon.
00:02:46.560 | But we've created some of our semantic routes already.
00:02:48.840 | So we've got a few points already in here.
00:02:52.760 | So we have a few points here.
00:02:54.320 | We also have a few points over here.
00:02:57.480 | And maybe a few more just over here.
00:03:00.400 | Now, when you look at this, we would just look at this
00:03:05.240 | and say, OK, it's very clear to us that, obviously, OK, purple,
00:03:09.840 | the purple, pink, whatever that is,
00:03:12.240 | that is the closest group of vectors within this 2D space.
00:03:21.240 | So we'd fairly easily just say, OK,
00:03:24.160 | let's say we return the top five records here.
00:03:29.000 | So we have 1, 2, 3, 4.
00:03:33.720 | And then probably the next closest one
00:03:36.000 | would be yellow over here.
00:03:38.120 | So we look at all these.
00:03:39.200 | And we say, OK, to be honest, it just
00:03:41.760 | looks, yeah, purple or pink.
00:03:44.720 | That route is the route that our query is most similar to.
00:03:52.400 | So in that case, we're just looking at, OK,
00:03:56.360 | what is most similar and saying, OK, that
00:03:58.080 | is 100% the classification, which we don't necessarily do.
00:04:02.640 | But let's say we are in this scenario.
00:04:04.960 | We take that back.
00:04:06.840 | So we take this, and we're going to call it Route A.
00:04:13.400 | And now we know, OK, our user's query here.
00:04:17.480 | We've classified it as Route A. And therefore, we
00:04:20.640 | are going to do something.
00:04:22.240 | And what you do is up to you.
00:04:24.400 | It doesn't-- like semantic router,
00:04:27.160 | you can use it in many different ways.
00:04:29.600 | It's up to you.
00:04:30.280 | We don't prescribe anything with it.
00:04:33.920 | But yeah, you would say, OK, Route A,
00:04:36.760 | let's say in this scenario, Route A is a guardrail.
00:04:41.400 | So if it's a guardrail, we might say, OK, I would just
00:04:44.240 | want to block whatever the user's doing.
00:04:47.000 | Or I just want to make the LLM more aware
00:04:51.200 | that it should be careful when it's answering
00:04:53.880 | this question, for example.
00:04:56.160 | So there were these examples earlier on in--
00:05:00.960 | like as chatbots were first being released to people,
00:05:05.840 | where I don't remember who it was,
00:05:07.760 | but maybe Renault or Volkswagen, or Ford even, I don't know.
00:05:15.320 | But people were talking to the chatbots
00:05:17.000 | and convincing the chatbots to sell them a car for $1.
00:05:23.200 | So you'd obviously just, in this scenario, what you could do
00:05:28.160 | is say, OK, this Route A here is actually
00:05:31.280 | a protective guardrail.
00:05:33.120 | We're saying, OK, if the user says something
00:05:35.360 | like, sell me this car for x price,
00:05:40.120 | you'd give an example here.
00:05:41.680 | So you'd say, OK, this would be an example query of something
00:05:45.360 | that we would not want to answer to.
00:05:48.680 | And this would be another example query
00:05:51.400 | of something we wouldn't want to answer to,
00:05:53.360 | which would be similar.
00:05:54.280 | It would be within that same topic of someone's
00:05:57.240 | trying to get you get a car for a cheap price.
00:06:02.720 | These are the things you'd look for.
00:06:04.960 | And this is just one example.
00:06:07.120 | There are many ways of doing this.
00:06:08.600 | And this isn't necessarily a way that I would even do it.
00:06:10.920 | But this is just one example.
00:06:13.080 | So once you've done that, OK, you trigger your guardrail.
00:06:16.040 | So you might have a pre-written response.
00:06:20.480 | And you're just going to return that directly to a user.
00:06:22.840 | So in this case, we won't even hit our LLM.
00:06:26.360 | Or what we might do is, OK, we say
00:06:28.880 | we're going to pass on the user's original query.
00:06:33.360 | So we have that original query there.
00:06:37.360 | But then we might want to add a warning or some sort of system
00:06:40.200 | or modify our system message or something on there.
00:06:43.760 | So let's, in this case, we'll add a little warning saying,
00:06:47.040 | OK, no washout.
00:06:50.840 | They're probably trying to screw you.
00:06:52.560 | Be aware of that.
00:06:53.800 | Remember that we do not sell cars for $1.
00:06:57.760 | Give the LLM some additional instructions.
00:07:01.520 | This is the most basic--
00:07:03.800 | probably one of the most basic use cases of Semantic Router
00:07:07.280 | when it comes to agents or chatbots in general.
00:07:11.560 | And it's a very, in my opinion, it's a pretty good use case.
00:07:17.360 | It works really well.
00:07:18.360 | And it's really the first step of what
00:07:20.600 | you would do in this scenario.
00:07:22.480 | Now, I want to talk about a lot more than this in this video.
00:07:26.200 | We're just going to go over a few different examples.
00:07:28.440 | Conceptually, I will show you a little bit of code
00:07:30.680 | just to give you a bit of an idea.
00:07:32.200 | But we're sticking conceptually here.
00:07:35.040 | And we'll go into more detail in all of these in the future.
00:07:39.840 | But for now, we're sticking with this.
00:07:41.360 | So first, let me just explain a little bit of what I was--
00:07:47.000 | OK, we just visualized that 2D chart.
00:07:49.520 | How does that fit in with actual code?
00:07:52.000 | OK, so at the moment, we're using the dev branch
00:07:55.800 | of Semantic Router.
00:07:56.960 | I'm currently working on the first stable release
00:08:01.280 | of the library, which is in progress, going well.
00:08:06.200 | And because of that, there are going
00:08:08.680 | to be some breaking changes.
00:08:09.880 | So that's why I want to show you this current version
00:08:12.400 | of the code, which is why we're using this dev pre-release
00:08:14.800 | here.
00:08:16.080 | But let me just explain a little bit of what we just saw.
00:08:21.280 | So we had those points.
00:08:23.960 | So this is a different example.
00:08:26.480 | We-- OK, let's say this is our vector space.
00:08:30.840 | We have six here, so we're going to put six here as well.
00:08:34.960 | And then we're going to have our second route.
00:08:36.840 | This is the chitchat route here.
00:08:39.920 | And that includes, let's say, a little bit closer together.
00:08:44.120 | That might include five different routes.
00:08:47.200 | So what we have here is we have these utterances.
00:08:50.800 | Every single one of these is encoded with a encoding model.
00:08:56.960 | More on that in a moment.
00:08:58.840 | And it creates a vector or vector embedding
00:09:02.120 | within our 2D high dimensional space.
00:09:05.560 | It's not actually 2D.
00:09:06.480 | We're just visualizing it as 2D.
00:09:07.960 | So this area here, this area, this
00:09:14.640 | becomes basically a catchment, like a fuzzy matching
00:09:19.400 | area for this chitchat route.
00:09:24.320 | So this is our chitchat, fuzzy matching area in semantic space.
00:09:30.280 | Then we have this other one here,
00:09:32.240 | which is this politics route.
00:09:34.240 | And the politics route is exactly the same.
00:09:36.760 | So we have this fuzzy matching area here,
00:09:43.360 | and that becomes our politics route.
00:09:48.460 | So then what happens is, let's say a user comes in,
00:09:53.160 | they make a query, it drops over here.
00:09:56.280 | This is something we didn't see in the last little sketch.
00:09:59.120 | So this is outside of these lines here.
00:10:03.160 | And these lines, they're actually
00:10:04.600 | defined by something called a threshold or score threshold,
00:10:10.240 | which is another variable that you can--
00:10:13.600 | I didn't put it here.
00:10:15.040 | There's another variable that you can define.
00:10:18.040 | And you can also optimize automatically and stuff.
00:10:21.320 | So I think, by default, with OpenAI's Embed 3 model,
00:10:26.640 | this would just default to 0.3, which
00:10:28.880 | is kind of like a good place where it seems anything
00:10:31.880 | below that is kind of not that similar,
00:10:36.720 | whereas anything above that is similar.
00:10:39.320 | But what you will tend to find is that some routes that you
00:10:44.680 | define really need a higher threshold or a lower threshold.
00:10:50.360 | And that is basically the sensitivity
00:10:52.240 | or the area of this catchment area
00:10:55.920 | that you are generating.
00:10:58.320 | So if you were to increase your score threshold,
00:11:03.120 | let's say for the politics one, you'd
00:11:05.280 | basically just be making it catch more stuff.
00:11:09.080 | It wouldn't overlap with your other chitchat route.
00:11:13.200 | There's always going to be almost
00:11:14.840 | like a boundary between them.
00:11:17.200 | But you would be increasing or decreasing
00:11:19.680 | that based on your threshold, essentially.
00:11:24.280 | So what that becomes is anything,
00:11:28.440 | I think, at least for the Embed 3 models, anything at 0,
00:11:33.720 | if I'm not wrong, anything at 0.
00:11:36.320 | Some models go to minus 1 and so on.
00:11:38.640 | So that's why I'm not sure.
00:11:40.120 | I think it's 0 is basically anything
00:11:44.640 | will trigger this route.
00:11:48.520 | So in that scenario, what would probably happen,
00:11:52.080 | this purple route here would basically
00:11:54.240 | just take up the entire vector space,
00:11:56.360 | except from this chitchat area.
00:11:59.080 | So all of this would be like the--
00:12:01.760 | sorry, not even all.
00:12:04.560 | All of everything here would be the catchment area
00:12:08.920 | for your politics route.
00:12:12.080 | And that is what would happen if you went with 0.0.
00:12:18.320 | Then on the other hand, on the other end of the spectrum
00:12:22.680 | there, the 1.0, again, this will vary by model,
00:12:28.640 | by encoder or embedding model, which, again, I'll
00:12:32.960 | show you in a moment.
00:12:34.560 | But basically, what that would do
00:12:36.240 | is say, OK, the only things that are going to match
00:12:39.280 | are the things that are an exact match of the, well,
00:12:45.160 | whatever utterances you've defined here.
00:12:47.560 | So these are the utterances.
00:12:48.960 | These are the exact phrases that have
00:12:51.680 | been encoded in the vector space.
00:12:53.720 | And if you have a threshold set of 1,
00:12:57.920 | then basically the user's query needs
00:13:00.360 | to fit exactly where one of the utterances in this politics
00:13:08.920 | route is.
00:13:10.360 | So the user would literally have to say,
00:13:13.400 | isn't politics the best thing ever
00:13:15.240 | with this exact format and punctuation and everything?
00:13:21.760 | They would have to say that exact thing.
00:13:24.840 | So that's like the sliding scale that you have of these,
00:13:29.280 | like the sensitivity, essentially.
00:13:31.840 | Maybe sensitivity is even a better parameter name
00:13:35.720 | for that now I think about it.
00:13:37.680 | But let's-- we're rewriting stuff, so maybe.
00:13:42.720 | So yeah, in this case, let's say we
00:13:46.600 | are sticking with the score threshold of 0.3.
00:13:50.120 | You have your query come in.
00:13:51.600 | And what it's going to do is, OK, you're
00:13:53.320 | going to-- what is basically going to happen,
00:13:55.920 | it's going to look at, I think, the top five by default,
00:13:59.560 | if I'm not wrong.
00:14:00.680 | So it's going to look at these.
00:14:02.560 | It's going to calculate the similarities
00:14:05.320 | between this and the query.
00:14:06.680 | And what it's going to see is that actually, OK, this--
00:14:09.440 | probably the chitchat route here is the most similar.
00:14:13.760 | So we have our user's query.
00:14:15.640 | It is most similar, so it's kind of tied to our chitchat route.
00:14:21.360 | But it is not similar enough to surpass this threshold
00:14:26.880 | that we've set here.
00:14:29.080 | So in that case, what will happen
00:14:31.960 | is this will come out to having a route--
00:14:37.000 | we call it a route choice, actually--
00:14:39.400 | of just none.
00:14:42.400 | It's basically no classification.
00:14:45.160 | And yeah, I mean, that is how they map and how they work.
00:14:51.600 | So utterances, routes, and thresholds or sensitivity,
00:14:57.640 | however you want to think about it.
00:14:59.800 | And actually, before I do move on,
00:15:02.080 | I just want to point out here, I've created this little routes
00:15:05.960 | variable, which contains both our polities
00:15:08.920 | and that chitchat route.
00:15:11.280 | OK, so very quickly on the encoders,
00:15:13.680 | the encoder is just that step in the middle.
00:15:15.760 | So you have your question here, or even the vectors
00:15:22.320 | that you create beforehand, or the utterances
00:15:26.560 | that you define beforehand.
00:15:28.680 | And you're basically just putting it
00:15:30.180 | through that encoding model.
00:15:31.920 | So in this case, if we're using an OpenAI encoder,
00:15:37.200 | that would be the embed3, small, I think, if I'm not wrong.
00:15:43.120 | It would be that model.
00:15:44.080 | Or if you're using Cohere or another provider,
00:15:47.400 | it would be the equivalent encoder.
00:15:51.520 | And you can decide, if you say, OK,
00:15:53.640 | I don't want to use embed3 small,
00:15:55.960 | I want to use embed3 large, you can do so.
00:15:58.360 | So you just-- I think it's a name parameter.
00:16:03.800 | And then you just define which model you want to use there.
00:16:07.120 | So yeah, you'd create your embedding, and that's it.
00:16:10.680 | That's what the encoders are.
00:16:12.040 | There are a ton of them in the library already,
00:16:14.480 | and more coming all the time.
00:16:17.140 | You have sparse, and bends, and whatever else in there.
00:16:20.920 | So there are a few options.
00:16:23.920 | But more on all that later, not in this video.
00:16:26.680 | So the-- OK, this is one of the things that have changed.
00:16:30.800 | So 0.0.72, if I'm not wrong, that
00:16:38.080 | was the last old version of the library.
00:16:42.080 | Now we're going on to 0.1.0.
00:16:45.480 | One thing that's changed, that's this.
00:16:48.280 | Well, OK, there are many changes, but most of them
00:16:52.400 | are not as obvious as this one, which
00:16:54.560 | is we now have semantic router routers,
00:16:56.720 | and we have the semantic router class.
00:16:59.200 | This is what used to be the route layer.
00:17:06.600 | And the route layer, it was its own specific thing.
00:17:11.120 | What we've done with this new refactor
00:17:14.040 | is just standardize things a lot,
00:17:15.840 | which means it's much easier for us
00:17:17.640 | to go and build new routers using different techniques.
00:17:23.120 | So it's not just about semantic vector search.
00:17:26.400 | There are a lot of other things that we can do,
00:17:28.960 | a lot of ways we can enhance that.
00:17:31.040 | But again, something for a future video, not this video.
00:17:37.080 | But beyond that, there isn't too much difference.
00:17:40.360 | So the route layer, in the past, would have this encoder.
00:17:43.640 | It would have these-- you would insert your routes.
00:17:48.520 | The one thing that is different is
00:17:49.880 | we have this autosync option.
00:17:51.800 | This autosync option, before, used
00:17:53.800 | to belong to your index objects in the library.
00:17:58.320 | Basically, just saying-- it's not super important,
00:18:01.600 | but right now, just set it to autosync local.
00:18:05.560 | In the future, I will explain this in more detail.
00:18:10.280 | But for now, it's just synchronizing, essentially,
00:18:13.560 | between your route layer and your index.
00:18:15.280 | And we can see that route choice thing that I mentioned before.
00:18:19.200 | So we had our query go in, and we have some politics ones
00:18:26.400 | up here, our politics routes.
00:18:30.680 | You know, I did this in the wrong order, but fine.
00:18:33.360 | We have our politics route here.
00:18:35.520 | We have our chitchat route over here.
00:18:42.080 | All right, now let me redo that.
00:18:44.360 | So let's say we're going to call the semantic router with,
00:18:49.040 | don't you love politics?
00:18:50.760 | So it's going to go in, and it's going to be like, oh, OK,
00:18:53.520 | maybe it's kind of conversational.
00:18:54.920 | It's a bit chitchatty.
00:18:57.040 | So maybe it goes towards that direction,
00:18:59.440 | but it's still very firmly within that politics route.
00:19:04.440 | And that is what you would get here, right?
00:19:06.360 | So you see you've got the name politics, function calls,
00:19:09.760 | something to talk about in the future, similarity score,
00:19:13.560 | another thing to talk about in the future,
00:19:16.480 | but actually something you need to work on.
00:19:18.480 | So the main thing I want to focus on here
00:19:20.560 | is just pretend--
00:19:21.880 | actually, just pretend these don't exist.
00:19:24.040 | They're not there.
00:19:25.560 | So route choice, politics.
00:19:29.000 | Now let's do another one.
00:19:31.080 | We're going to say, OK, how's the weather today?
00:19:34.360 | Very firmly in the chitchat route.
00:19:40.840 | You're comparing.
00:19:42.360 | We're seeing, all right, the most similar ones here.
00:19:45.680 | Even if we're catching one from the other politics route,
00:19:48.800 | which we probably wouldn't in this scenario,
00:19:50.600 | but even if we were, we'd be looking
00:19:53.040 | at the overall scoring between everything.
00:19:56.840 | And it would say quite firmly, OK, here I
00:19:58.960 | am, I'm within this chitchat area.
00:20:03.840 | And that is what we would get, ignoring these things
00:20:07.360 | that we don't need to go through.
00:20:09.440 | So we get chitchat.
00:20:11.600 | Then we can say something else.
00:20:13.440 | I'm interested in learning about LLAMA2.
00:20:16.400 | In this scenario, it's not really either of those.
00:20:18.640 | So we might have something that's kind of over here.
00:20:21.600 | We see, OK, we've got these.
00:20:23.800 | They're kind of their nearest neighbors.
00:20:27.960 | But none of the similarities here cross that threshold.
00:20:33.320 | So unfortunately-- well, not unfortunately.
00:20:36.640 | It's quite fortunate.
00:20:39.520 | We would return none for our categorization.
00:20:43.320 | And I mean, that high level, that
00:20:46.920 | is what semantic route is doing.
00:20:48.400 | But then you think, OK, I'm doing
00:20:51.240 | this for the inputs of some queries from a user.
00:20:57.400 | And OK, I can classify the user's input query.
00:21:02.920 | But what can I use that classification for?
00:21:05.040 | Do I need to use it in this way?
00:21:06.360 | No, you don't.
00:21:07.360 | You can use it in so many different ways,
00:21:13.040 | and probably many other ways I can't even think of.
00:21:15.120 | But let me just at a very high level mention some of those.
00:21:21.680 | So the first thing that I would like to mention
00:21:26.560 | is, if we go back to here, this first example I gave,
00:21:31.600 | someone's basically trying to hack the agent
00:21:36.040 | to get it to do something that we don't actually
00:21:38.840 | want it to do.
00:21:41.040 | So we have this structure.
00:21:43.120 | Now, what if we modify that structure a little bit?
00:21:46.760 | Now, the reason I say let's modify the structure
00:21:49.000 | a little bit is because there's a very obvious weakness
00:21:51.800 | with this, which is it's very hard to predict all the ways
00:21:55.320 | that a user is going to try and hack your system to say
00:21:57.960 | something that you don't want it to say.
00:21:59.920 | And actually, not all the time, but in many cases,
00:22:03.600 | it can be much easier to predict what the agent should not say
00:22:08.960 | or the LLM should not say.
00:22:11.520 | So you probably have a particular domain
00:22:16.080 | that you would like to talk about.
00:22:17.760 | You have particular things that you
00:22:19.520 | don't want your agent to say.
00:22:21.720 | I don't want to sell a car for some low price.
00:22:25.560 | I actually just don't want to tell it.
00:22:27.240 | I probably don't want to tell people, OK,
00:22:29.040 | you can buy this car for this price.
00:22:30.760 | It is just a bad idea.
00:22:33.800 | So what we should instead be focusing on here
00:22:37.920 | is not what the user is saying, but what our LLM is saying.
00:22:42.080 | So let's go back.
00:22:45.960 | Let's modify this.
00:22:47.640 | So we have our user's query at the top here.
00:22:50.080 | So there.
00:22:51.120 | The queries or the utterances that we're setting here,
00:22:55.720 | let's say rather than it being these questions that
00:23:00.960 | are user-focused, so some question from a user,
00:23:04.600 | rather than it being that, let's say
00:23:06.680 | these are statements or responses from our agent,
00:23:12.920 | ones that we don't want it to say.
00:23:14.800 | So for this query, we're going to say,
00:23:17.000 | yes, you can have this car for $1.
00:23:21.120 | We don't want that to happen.
00:23:23.800 | We can give another example.
00:23:25.400 | Yes, we can do 50% discount.
00:23:29.040 | This will be this query here.
00:23:30.760 | And then we can give many different examples.
00:23:33.480 | These other routes that we have here
00:23:35.720 | might actually be similar.
00:23:37.520 | So they might be, oh, we can finance your car or something.
00:23:44.280 | Similar, but slightly different.
00:23:46.000 | And that's why they would be in these different routes.
00:23:48.320 | But they still might be these sort of protective guardrails
00:23:53.160 | against giving out financial deals
00:23:56.240 | that you actually don't want it to give.
00:23:59.080 | So in this scenario, these routes here
00:24:02.320 | are actually basically outputs from your agent,
00:24:07.920 | not inputs from your user.
00:24:10.600 | So this user query comes in.
00:24:12.560 | And whereas before, we were passing it
00:24:14.520 | to the embedding model here, we're not going to do that.
00:24:18.080 | Instead, let's say the circle is our LLM.
00:24:22.360 | So our LLM is going to generate something.
00:24:25.280 | Actually, let's make our LLM a different color.
00:24:28.240 | Oh, no.
00:24:29.280 | OK, it's red.
00:24:31.000 | So our LLM is going to create some output.
00:24:36.720 | It's not good.
00:24:37.720 | So it's giving someone 75% discount.
00:24:43.120 | And we just don't want that.
00:24:44.800 | So what we're going to do now is we're going to take this.
00:24:48.560 | And we're going to put it through our embedding model.
00:24:51.400 | And we're going to come over here.
00:24:54.480 | And we put it in.
00:24:56.640 | It's probably-- OK, discount is around here.
00:25:00.400 | So we put it in there.
00:25:01.280 | We compare.
00:25:01.780 | We're like, oh, no, this is not very good.
00:25:03.520 | We've produced some bad output that we don't want.
00:25:07.600 | We just don't want this to happen.
00:25:09.240 | So there's different things you could do in this scenario.
00:25:13.680 | One thing is maybe you just want to have a pre-written response.
00:25:17.560 | This is the safest thing to do.
00:25:19.880 | So you just take from your wherever, your database,
00:25:23.640 | your code, whatever you have.
00:25:26.640 | So you have your pre-written response
00:25:28.180 | if this particular route is triggered.
00:25:31.320 | So we're going to say, ah, OK, we're not going to do that.
00:25:34.060 | We're going to come straight over here.
00:25:35.680 | We're going to go to our pre-written response.
00:25:37.560 | And we're going to return that to the user.
00:25:42.640 | Done.
00:25:44.200 | Easy.
00:25:45.440 | The other option, which is, I would say, more risky,
00:25:49.660 | but it's an option.
00:25:51.320 | You can do it, is you can come--
00:25:55.040 | OK, you can take what you've done here.
00:25:59.600 | You're going to come over here.
00:26:02.240 | You've got some additional prompting.
00:26:05.880 | Or maybe you completely swap out the LLMs system prompt.
00:26:09.760 | And you tell it, OK, you're not a sales person anymore.
00:26:13.040 | Now you're a protective defense against people
00:26:20.080 | trying to scam the company.
00:26:23.600 | But you stop them from scamming you in a nice way
00:26:27.760 | with polite language.
00:26:29.320 | You just modify something to basically put it
00:26:33.720 | into safety mode.
00:26:35.820 | Or you just add a warning here again.
00:26:39.480 | OK, don't-- something.
00:26:43.200 | You can also do that if you want.
00:26:45.120 | But in any case, whichever approach you go for,
00:26:48.680 | you're going to basically generate another output.
00:26:52.400 | Hopefully, this time, the output is good.
00:26:55.720 | If it's not, then again, maybe you just
00:26:57.320 | want to default to that backup query from over here.
00:27:03.760 | It's up to you.
00:27:04.600 | But the benefit of going with this route, where you are
00:27:07.840 | actually generating again, is obviously
00:27:11.000 | the agent or chatbot can still seem quite fluid.
00:27:15.960 | So it goes through here, it does everything.
00:27:18.040 | And we say, OK, this is passed.
00:27:20.160 | And now we have a much more fluid response
00:27:23.080 | to a user that isn't pre-written, which can--
00:27:26.400 | often in a chat, a pre-written thing is kind of weird.
00:27:32.440 | And it can really throw things off.
00:27:35.000 | So yeah, I mean, you have that.
00:27:38.760 | That is a different approach to doing this whole thing.
00:27:43.520 | And then just very quickly, there
00:27:46.960 | are so many use cases that we can talk about and talk
00:27:51.960 | about them in depth.
00:27:54.680 | I just want to give you a very high-level view
00:27:57.560 | of a couple of others that I think
00:27:59.400 | are quite interesting and cool.
00:28:01.240 | And focusing on chat, we can actually
00:28:05.840 | use Semantic Router in different modalities,
00:28:08.160 | like with images and stuff.
00:28:10.600 | A lot of different use cases there.
00:28:13.440 | But I just want to focus on the chat use case,
00:28:16.560 | or in particular, the language-focused or language
00:28:22.000 | modality chat use cases.
00:28:25.120 | So of course, we have the guardrails,
00:28:28.800 | which we've just spoken about.
00:28:31.560 | And we kind of touched upon not just using them as guardrails,
00:28:34.920 | but almost using them as behavioral modifications,
00:28:39.160 | where you are essentially, OK, you're
00:28:41.320 | going to modify the incoming query or the incoming user
00:28:46.680 | prompt, or user query, whatever.
00:28:51.800 | You modify that, or you modify the system prompt,
00:28:57.320 | or you add additional instructions
00:28:59.920 | to the system prompt, or you add in an additional system
00:29:02.720 | prompt in a different place.
00:29:05.480 | There's so many different things you can do there
00:29:08.440 | to modify basic behavior.
00:29:10.680 | Oh, another thing as well on that
00:29:12.240 | is actually kind of comes under behavioral modification,
00:29:17.600 | but I do want to point out as a use case, which is tool use.
00:29:22.600 | So if you trigger a particular route,
00:29:26.800 | you can, in several different ways,
00:29:29.240 | you can modify the tool use of your agent,
00:29:33.720 | assuming you have different tools.
00:29:35.800 | So that might be, in some scenarios,
00:29:38.240 | you might want to say, OK, we are
00:29:39.960 | going to select, let's say you have
00:29:45.320 | all of these different tools that your agent can use.
00:29:48.120 | Each one of these dots is a tool.
00:29:50.640 | And in one scenario where, let's say,
00:29:54.360 | the politics route is triggered, what you would do
00:29:57.360 | is you might want to filter your potential tool
00:30:01.400 | use to these specific tools, because you
00:30:03.200 | know that your agent in this scenario
00:30:06.240 | should be using one of those.
00:30:08.640 | It is very useful in many ways.
00:30:11.360 | One, it makes your prompts more concise.
00:30:14.600 | And by making your prompts more concise,
00:30:16.720 | you are saving on latency, you're saving on costs,
00:30:20.840 | and you're probably going to get better performance
00:30:24.360 | or accuracy out of your LLM, because you're
00:30:25.960 | giving it less choice.
00:30:26.880 | And it's always good to constrain the options
00:30:30.640 | that an LLM has.
00:30:32.200 | That will tend to lead to better accuracy and performance.
00:30:36.720 | OK, the other one is the other filter, which would
00:30:41.760 | be the chitchat filter.
00:30:43.680 | So in this scenario, let's say we
00:30:45.680 | want to filter it to these specific tools.
00:30:49.280 | Or in another scenario, let's say, actually, we
00:30:53.200 | know when this happens, when this query triggers,
00:30:58.640 | we actually know that we need to use this specific tool.
00:31:02.320 | So we actually force it to use that tool in that scenario.
00:31:06.200 | Another one might just be that, OK, we
00:31:08.280 | have that user query coming in, so on and so on.
00:31:13.040 | And we think that this user query would probably
00:31:16.480 | benefit from using a particular tool.
00:31:19.360 | So in that scenario, we would maybe just
00:31:22.400 | modify this user query.
00:31:24.240 | It doesn't need to be an alert, but some sort
00:31:28.400 | of little informational thing, like, hey, maybe
00:31:31.200 | you can use this.
00:31:33.040 | Or you can actually, like, programmatically just
00:31:36.760 | reduce the number of tools it has access to.
00:31:39.280 | So you're just going to be like, no, no, no, no, no.
00:31:42.400 | You can't use any of these.
00:31:43.520 | They don't exist.
00:31:44.640 | The only one you can use is this one.
00:31:47.240 | Maybe you can use both of those together.
00:31:50.400 | The things that you can do is--
00:31:52.240 | well, it's actually completely up to you.
00:31:53.860 | Because semantic router is just triggering something.
00:31:58.520 | What it triggers is kind of up to you.
00:32:01.960 | Now, these are all very agent LLM-specific things.
00:32:07.360 | But we can broaden this out to just our agentic workflow,
00:32:12.040 | or even beyond that.
00:32:13.800 | But let's focus on that.
00:32:15.680 | So let's broaden it out to, OK, in particular scenarios,
00:32:21.400 | you might want to use a different LLM
00:32:23.760 | in different places.
00:32:25.000 | Or you might want to use different system props.
00:32:27.120 | Or you might want to use different temperature settings.
00:32:31.080 | So we might get a user query that comes in.
00:32:34.160 | And it's like, OK, I want to write a story.
00:32:36.640 | I don't know.
00:32:40.280 | I want to write a story.
00:32:41.640 | That should be all you know.
00:32:43.800 | Obviously, they want to be a bit more creative.
00:32:46.880 | And I think the vast majority of people
00:32:48.760 | prefer using not really opening eyes models,
00:32:53.480 | more like Anthropic or other providers
00:32:56.040 | for creative writing.
00:32:58.480 | So what you could end up having here
00:33:01.080 | is like an LLM router, where based on particular queries,
00:33:05.200 | you're routing to the models that
00:33:07.240 | are better suited for this particular type of query
00:33:12.640 | from a user.
00:33:13.720 | So you might be going to different LLMs.
00:33:18.080 | Or also, you might just be using different system prompts.
00:33:21.520 | So the system prompts for helping someone out
00:33:24.320 | with their code is probably going
00:33:26.920 | to be fairly different to someone
00:33:31.640 | wanting advice on the ingredients
00:33:35.000 | to cook a nice meal, right?
00:33:38.040 | The different use cases, probably you're
00:33:40.040 | going to require different prompts.
00:33:42.680 | Of course, they don't even need to be that varied, right?
00:33:46.000 | You can have much more specific areas
00:33:49.200 | or more specific system prompts for very
00:33:52.760 | specific different things.
00:33:54.840 | And then, of course, as I mentioned,
00:33:56.400 | there's the temperature and other model settings.
00:33:59.960 | So in a creative writing example,
00:34:02.840 | someone asks you, they want to write a story.
00:34:05.000 | The temperature, you're just going to put that up
00:34:07.800 | to 0.9 or something.
00:34:10.280 | Whereas someone's asking, oh, I really
00:34:12.440 | need help figuring out, I have some code
00:34:16.520 | and I have this problem.
00:34:18.240 | What do I do?
00:34:20.120 | You're just going to turn that down like 0.1 or 0.0 or 0.01,
00:34:26.760 | whatever it actually is.
00:34:29.240 | So that's another-- focus on the conversational side of things,
00:34:33.760 | another thing we can do with it.
00:34:35.100 | And yeah, I mean, there are just so many of these use cases.
00:34:38.040 | I could go on forever.
00:34:38.960 | I'm not going to.
00:34:39.720 | I'm going to leave it there.
00:34:41.040 | So yeah, I just wanted to very quickly go
00:34:43.840 | through a few different--
00:34:45.440 | the concept of Semantic Router, in particular,
00:34:49.080 | with conversations.
00:34:50.440 | So yeah, I mean, I think I've done that.
00:34:52.840 | We're going to go into a lot more detail with many more
00:34:55.960 | of these in the near future.
00:34:59.360 | So for now, I will leave it there.
00:35:01.920 | So just thank you very much for watching.
00:35:04.880 | I hope this was useful and interesting.
00:35:07.600 | So I will see you again in the next one.
00:35:11.860 | [MUSIC PLAYING]
00:35:15.720 | [MUSIC PLAYING]
00:35:19.920 | [MUSIC PLAYING]
00:35:23.280 | (soft music)