Better Chatbots with Semantic Routes

00:00:00.000 | Today, we're going to be talking about how

00:00:02.120 | we can use this concept of semantic routing

00:00:05.840 | or semantic routes to broaden the number of things

00:00:11.120 | that we can do within the context of chatbots and AI

00:00:15.680 | agents, and also get a very fine level of control

00:00:23.880 | over LLMs and our sort of agentic workflows,

00:00:29.240 | and to be able to take that control much further

00:00:32.080 | than if we were not using this idea of semantic routes.

00:00:37.520 | So I want to start by just explaining

00:00:40.840 | what I mean when I'm talking about semantic routes.

00:00:43.800 | So the core concept behind semantic router

00:00:47.520 | in the context of chatbots and agents,

00:00:51.480 | like conversational AI essentially,

00:00:54.520 | a user is going to come in, and they're

00:00:57.000 | going to have some queried, and they're

00:00:59.080 | going to say something.

00:01:00.720 | So they have whatever they're writing,

00:01:04.120 | their question, whatever it is.

00:01:07.000 | They bring that in.

00:01:08.320 | And what we can do is we use the concept that

00:01:15.480 | comes from vector search and vector retrieval, which

00:01:19.560 | is we take this text.

00:01:22.640 | We put it through an embedding model.

00:01:24.840 | So like OpenAIs, Embed3, Cohere's embedding models,

00:01:30.320 | or among the many open source ones that we can also use.

00:01:35.520 | And OK, so we put it through this embedding model here.

00:01:40.000 | And what we get is a vector, or an embedding,

00:01:44.120 | or a vector embedding, whatever you want to call it.

00:01:47.680 | Basically, that vector embedding is a point

00:01:50.360 | in a high dimensional space.

00:01:54.120 | So for example, with OpenAI's Embed3 model,

00:01:57.880 | that high dimensional space is 1,536 dimensions.

00:02:03.360 | So imagine 3D, but a lot more.

00:02:06.480 | And therefore, you can't really imagine it.

00:02:09.160 | So it's basically, imagine 3D, but just

00:02:13.400 | know that it's not actually 3D, or 2D if you like as well.

00:02:17.240 | Whatever works.

00:02:18.200 | So OK, we have our query.

00:02:21.680 | We've turned that into this point on a 2D plane.

00:02:28.560 | Now, as we do with vector search,

00:02:32.720 | we're going to be looking at what

00:02:35.080 | this query is surrounded by.

00:02:38.040 | So in this scenario, let's say we've created

00:02:41.760 | some semantic routes already now.

00:02:43.800 | I'll explain a little bit on how we do that soon.

00:02:46.560 | But we've created some of our semantic routes already.

00:02:48.840 | So we've got a few points already in here.

00:02:52.760 | So we have a few points here.

00:02:54.320 | We also have a few points over here.

00:02:57.480 | And maybe a few more just over here.

00:03:00.400 | Now, when you look at this, we would just look at this

00:03:05.240 | and say, OK, it's very clear to us that, obviously, OK, purple,

00:03:09.840 | the purple, pink, whatever that is,

00:03:12.240 | that is the closest group of vectors within this 2D space.

00:03:21.240 | So we'd fairly easily just say, OK,

00:03:24.160 | let's say we return the top five records here.

00:03:29.000 | So we have 1, 2, 3, 4.

00:03:33.720 | And then probably the next closest one

00:03:36.000 | would be yellow over here.

00:03:38.120 | So we look at all these.

00:03:39.200 | And we say, OK, to be honest, it just

00:03:41.760 | looks, yeah, purple or pink.

00:03:44.720 | That route is the route that our query is most similar to.

00:03:52.400 | So in that case, we're just looking at, OK,

00:03:56.360 | what is most similar and saying, OK, that

00:03:58.080 | is 100% the classification, which we don't necessarily do.

00:04:02.640 | But let's say we are in this scenario.

00:04:04.960 | We take that back.

00:04:06.840 | So we take this, and we're going to call it Route A.

00:04:13.400 | And now we know, OK, our user's query here.

00:04:17.480 | We've classified it as Route A. And therefore, we

00:04:20.640 | are going to do something.

00:04:22.240 | And what you do is up to you.

00:04:24.400 | It doesn't-- like semantic router,

00:04:27.160 | you can use it in many different ways.

00:04:29.600 | It's up to you.

00:04:30.280 | We don't prescribe anything with it.

00:04:33.920 | But yeah, you would say, OK, Route A,

00:04:36.760 | let's say in this scenario, Route A is a guardrail.

00:04:41.400 | So if it's a guardrail, we might say, OK, I would just

00:04:44.240 | want to block whatever the user's doing.

00:04:47.000 | Or I just want to make the LLM more aware

00:04:51.200 | that it should be careful when it's answering

00:04:53.880 | this question, for example.

00:04:56.160 | So there were these examples earlier on in--

00:05:00.960 | like as chatbots were first being released to people,

00:05:05.840 | where I don't remember who it was,

00:05:07.760 | but maybe Renault or Volkswagen, or Ford even, I don't know.

00:05:15.320 | But people were talking to the chatbots

00:05:17.000 | and convincing the chatbots to sell them a car for $1.

00:05:23.200 | So you'd obviously just, in this scenario, what you could do

00:05:28.160 | is say, OK, this Route A here is actually

00:05:31.280 | a protective guardrail.

00:05:33.120 | We're saying, OK, if the user says something

00:05:35.360 | like, sell me this car for x price,

00:05:40.120 | you'd give an example here.

00:05:41.680 | So you'd say, OK, this would be an example query of something

00:05:45.360 | that we would not want to answer to.

00:05:48.680 | And this would be another example query

00:05:51.400 | of something we wouldn't want to answer to,

00:05:53.360 | which would be similar.

00:05:54.280 | It would be within that same topic of someone's

00:05:57.240 | trying to get you get a car for a cheap price.

00:06:02.720 | These are the things you'd look for.

00:06:04.960 | And this is just one example.

00:06:07.120 | There are many ways of doing this.

00:06:08.600 | And this isn't necessarily a way that I would even do it.

00:06:10.920 | But this is just one example.

00:06:13.080 | So once you've done that, OK, you trigger your guardrail.

00:06:16.040 | So you might have a pre-written response.

00:06:20.480 | And you're just going to return that directly to a user.

00:06:22.840 | So in this case, we won't even hit our LLM.

00:06:26.360 | Or what we might do is, OK, we say

00:06:28.880 | we're going to pass on the user's original query.

00:06:33.360 | So we have that original query there.

00:06:37.360 | But then we might want to add a warning or some sort of system

00:06:40.200 | or modify our system message or something on there.

00:06:43.760 | So let's, in this case, we'll add a little warning saying,

00:06:47.040 | OK, no washout.

00:06:50.840 | They're probably trying to screw you.

00:06:52.560 | Be aware of that.

00:06:53.800 | Remember that we do not sell cars for $1.

00:06:57.760 | Give the LLM some additional instructions.

00:07:01.520 | This is the most basic--

00:07:03.800 | probably one of the most basic use cases of Semantic Router

00:07:07.280 | when it comes to agents or chatbots in general.

00:07:11.560 | And it's a very, in my opinion, it's a pretty good use case.

00:07:17.360 | It works really well.

00:07:18.360 | And it's really the first step of what

00:07:20.600 | you would do in this scenario.

00:07:22.480 | Now, I want to talk about a lot more than this in this video.

00:07:26.200 | We're just going to go over a few different examples.

00:07:28.440 | Conceptually, I will show you a little bit of code

00:07:30.680 | just to give you a bit of an idea.

00:07:32.200 | But we're sticking conceptually here.

00:07:35.040 | And we'll go into more detail in all of these in the future.

00:07:39.840 | But for now, we're sticking with this.

00:07:41.360 | So first, let me just explain a little bit of what I was--

00:07:47.000 | OK, we just visualized that 2D chart.

00:07:49.520 | How does that fit in with actual code?

00:07:52.000 | OK, so at the moment, we're using the dev branch

00:07:55.800 | of Semantic Router.

00:07:56.960 | I'm currently working on the first stable release

00:08:01.280 | of the library, which is in progress, going well.

00:08:06.200 | And because of that, there are going

00:08:08.680 | to be some breaking changes.

00:08:09.880 | So that's why I want to show you this current version

00:08:12.400 | of the code, which is why we're using this dev pre-release

00:08:14.800 | here.

00:08:16.080 | But let me just explain a little bit of what we just saw.

00:08:21.280 | So we had those points.

00:08:23.960 | So this is a different example.

00:08:26.480 | We-- OK, let's say this is our vector space.

00:08:30.840 | We have six here, so we're going to put six here as well.

00:08:34.960 | And then we're going to have our second route.

00:08:36.840 | This is the chitchat route here.

00:08:39.920 | And that includes, let's say, a little bit closer together.

00:08:44.120 | That might include five different routes.

00:08:47.200 | So what we have here is we have these utterances.

00:08:50.800 | Every single one of these is encoded with a encoding model.

00:08:56.960 | More on that in a moment.

00:08:58.840 | And it creates a vector or vector embedding

00:09:02.120 | within our 2D high dimensional space.

00:09:05.560 | It's not actually 2D.

00:09:06.480 | We're just visualizing it as 2D.

00:09:07.960 | So this area here, this area, this

00:09:14.640 | becomes basically a catchment, like a fuzzy matching

00:09:19.400 | area for this chitchat route.

00:09:24.320 | So this is our chitchat, fuzzy matching area in semantic space.

00:09:30.280 | Then we have this other one here,

00:09:32.240 | which is this politics route.

00:09:34.240 | And the politics route is exactly the same.

00:09:36.760 | So we have this fuzzy matching area here,

00:09:43.360 | and that becomes our politics route.

00:09:48.460 | So then what happens is, let's say a user comes in,

00:09:53.160 | they make a query, it drops over here.

00:09:56.280 | This is something we didn't see in the last little sketch.

00:09:59.120 | So this is outside of these lines here.

00:10:03.160 | And these lines, they're actually

00:10:04.600 | defined by something called a threshold or score threshold,

00:10:10.240 | which is another variable that you can--

00:10:13.600 | I didn't put it here.

00:10:15.040 | There's another variable that you can define.

00:10:18.040 | And you can also optimize automatically and stuff.

00:10:21.320 | So I think, by default, with OpenAI's Embed 3 model,

00:10:26.640 | this would just default to 0.3, which

00:10:28.880 | is kind of like a good place where it seems anything

00:10:31.880 | below that is kind of not that similar,

00:10:36.720 | whereas anything above that is similar.

00:10:39.320 | But what you will tend to find is that some routes that you

00:10:44.680 | define really need a higher threshold or a lower threshold.

00:10:50.360 | And that is basically the sensitivity

00:10:52.240 | or the area of this catchment area

00:10:55.920 | that you are generating.

00:10:58.320 | So if you were to increase your score threshold,

00:11:03.120 | let's say for the politics one, you'd

00:11:05.280 | basically just be making it catch more stuff.

00:11:09.080 | It wouldn't overlap with your other chitchat route.

00:11:13.200 | There's always going to be almost

00:11:14.840 | like a boundary between them.

00:11:17.200 | But you would be increasing or decreasing

00:11:19.680 | that based on your threshold, essentially.

00:11:24.280 | So what that becomes is anything,

00:11:28.440 | I think, at least for the Embed 3 models, anything at 0,

00:11:33.720 | if I'm not wrong, anything at 0.

00:11:36.320 | Some models go to minus 1 and so on.

00:11:38.640 | So that's why I'm not sure.

00:11:40.120 | I think it's 0 is basically anything

00:11:44.640 | will trigger this route.

00:11:48.520 | So in that scenario, what would probably happen,

00:11:52.080 | this purple route here would basically

00:11:54.240 | just take up the entire vector space,

00:11:56.360 | except from this chitchat area.

00:11:59.080 | So all of this would be like the--

00:12:01.760 | sorry, not even all.

00:12:04.560 | All of everything here would be the catchment area

00:12:08.920 | for your politics route.

00:12:12.080 | And that is what would happen if you went with 0.0.

00:12:18.320 | Then on the other hand, on the other end of the spectrum

00:12:22.680 | there, the 1.0, again, this will vary by model,

00:12:28.640 | by encoder or embedding model, which, again, I'll

00:12:32.960 | show you in a moment.

00:12:34.560 | But basically, what that would do

00:12:36.240 | is say, OK, the only things that are going to match

00:12:39.280 | are the things that are an exact match of the, well,

00:12:45.160 | whatever utterances you've defined here.

00:12:47.560 | So these are the utterances.

00:12:48.960 | These are the exact phrases that have

00:12:51.680 | been encoded in the vector space.

00:12:53.720 | And if you have a threshold set of 1,

00:12:57.920 | then basically the user's query needs

00:13:00.360 | to fit exactly where one of the utterances in this politics

00:13:08.920 | route is.

00:13:10.360 | So the user would literally have to say,

00:13:13.400 | isn't politics the best thing ever

00:13:15.240 | with this exact format and punctuation and everything?

00:13:21.760 | They would have to say that exact thing.

00:13:24.840 | So that's like the sliding scale that you have of these,

00:13:29.280 | like the sensitivity, essentially.

00:13:31.840 | Maybe sensitivity is even a better parameter name

00:13:35.720 | for that now I think about it.

00:13:37.680 | But let's-- we're rewriting stuff, so maybe.

00:13:42.720 | So yeah, in this case, let's say we

00:13:46.600 | are sticking with the score threshold of 0.3.

00:13:50.120 | You have your query come in.

00:13:51.600 | And what it's going to do is, OK, you're

00:13:53.320 | going to-- what is basically going to happen,

00:13:55.920 | it's going to look at, I think, the top five by default,

00:13:59.560 | if I'm not wrong.

00:14:00.680 | So it's going to look at these.

00:14:02.560 | It's going to calculate the similarities

00:14:05.320 | between this and the query.

00:14:06.680 | And what it's going to see is that actually, OK, this--

00:14:09.440 | probably the chitchat route here is the most similar.

00:14:13.760 | So we have our user's query.

00:14:15.640 | It is most similar, so it's kind of tied to our chitchat route.

00:14:21.360 | But it is not similar enough to surpass this threshold

00:14:26.880 | that we've set here.

00:14:29.080 | So in that case, what will happen

00:14:31.960 | is this will come out to having a route--

00:14:37.000 | we call it a route choice, actually--

00:14:39.400 | of just none.

00:14:42.400 | It's basically no classification.

00:14:45.160 | And yeah, I mean, that is how they map and how they work.

00:14:51.600 | So utterances, routes, and thresholds or sensitivity,

00:14:57.640 | however you want to think about it.

00:14:59.800 | And actually, before I do move on,

00:15:02.080 | I just want to point out here, I've created this little routes

00:15:05.960 | variable, which contains both our polities

00:15:08.920 | and that chitchat route.

00:15:11.280 | OK, so very quickly on the encoders,

00:15:13.680 | the encoder is just that step in the middle.

00:15:15.760 | So you have your question here, or even the vectors

00:15:22.320 | that you create beforehand, or the utterances

00:15:26.560 | that you define beforehand.

00:15:28.680 | And you're basically just putting it

00:15:30.180 | through that encoding model.

00:15:31.920 | So in this case, if we're using an OpenAI encoder,

00:15:37.200 | that would be the embed3, small, I think, if I'm not wrong.

00:15:43.120 | It would be that model.

00:15:44.080 | Or if you're using Cohere or another provider,

00:15:47.400 | it would be the equivalent encoder.

00:15:51.520 | And you can decide, if you say, OK,

00:15:53.640 | I don't want to use embed3 small,

00:15:55.960 | I want to use embed3 large, you can do so.

00:15:58.360 | So you just-- I think it's a name parameter.

00:16:03.800 | And then you just define which model you want to use there.

00:16:07.120 | So yeah, you'd create your embedding, and that's it.

00:16:10.680 | That's what the encoders are.

00:16:12.040 | There are a ton of them in the library already,

00:16:14.480 | and more coming all the time.

00:16:17.140 | You have sparse, and bends, and whatever else in there.

00:16:20.920 | So there are a few options.

00:16:23.920 | But more on all that later, not in this video.

00:16:26.680 | So the-- OK, this is one of the things that have changed.

00:16:30.800 | So 0.0.72, if I'm not wrong, that

00:16:38.080 | was the last old version of the library.

00:16:42.080 | Now we're going on to 0.1.0.

00:16:45.480 | One thing that's changed, that's this.

00:16:48.280 | Well, OK, there are many changes, but most of them

00:16:52.400 | are not as obvious as this one, which

00:16:54.560 | is we now have semantic router routers,

00:16:56.720 | and we have the semantic router class.

00:16:59.200 | This is what used to be the route layer.

00:17:06.600 | And the route layer, it was its own specific thing.

00:17:11.120 | What we've done with this new refactor

00:17:14.040 | is just standardize things a lot,

00:17:15.840 | which means it's much easier for us

00:17:17.640 | to go and build new routers using different techniques.

00:17:23.120 | So it's not just about semantic vector search.

00:17:26.400 | There are a lot of other things that we can do,

00:17:28.960 | a lot of ways we can enhance that.

00:17:31.040 | But again, something for a future video, not this video.

00:17:37.080 | But beyond that, there isn't too much difference.

00:17:40.360 | So the route layer, in the past, would have this encoder.

00:17:43.640 | It would have these-- you would insert your routes.

00:17:48.520 | The one thing that is different is

00:17:49.880 | we have this autosync option.

00:17:51.800 | This autosync option, before, used

00:17:53.800 | to belong to your index objects in the library.

00:17:58.320 | Basically, just saying-- it's not super important,

00:18:01.600 | but right now, just set it to autosync local.

00:18:05.560 | In the future, I will explain this in more detail.

00:18:10.280 | But for now, it's just synchronizing, essentially,

00:18:13.560 | between your route layer and your index.

00:18:15.280 | And we can see that route choice thing that I mentioned before.

00:18:19.200 | So we had our query go in, and we have some politics ones

00:18:26.400 | up here, our politics routes.

00:18:30.680 | You know, I did this in the wrong order, but fine.

00:18:33.360 | We have our politics route here.

00:18:35.520 | We have our chitchat route over here.

00:18:42.080 | All right, now let me redo that.

00:18:44.360 | So let's say we're going to call the semantic router with,

00:18:49.040 | don't you love politics?

00:18:50.760 | So it's going to go in, and it's going to be like, oh, OK,

00:18:53.520 | maybe it's kind of conversational.

00:18:54.920 | It's a bit chitchatty.

00:18:57.040 | So maybe it goes towards that direction,

00:18:59.440 | but it's still very firmly within that politics route.

00:19:04.440 | And that is what you would get here, right?

00:19:06.360 | So you see you've got the name politics, function calls,

00:19:09.760 | something to talk about in the future, similarity score,

00:19:13.560 | another thing to talk about in the future,

00:19:16.480 | but actually something you need to work on.

00:19:18.480 | So the main thing I want to focus on here

00:19:20.560 | is just pretend--

00:19:21.880 | actually, just pretend these don't exist.

00:19:24.040 | They're not there.

00:19:25.560 | So route choice, politics.

00:19:29.000 | Now let's do another one.

00:19:31.080 | We're going to say, OK, how's the weather today?

00:19:34.360 | Very firmly in the chitchat route.

00:19:40.840 | You're comparing.

00:19:42.360 | We're seeing, all right, the most similar ones here.

00:19:45.680 | Even if we're catching one from the other politics route,

00:19:48.800 | which we probably wouldn't in this scenario,

00:19:50.600 | but even if we were, we'd be looking

00:19:53.040 | at the overall scoring between everything.

00:19:56.840 | And it would say quite firmly, OK, here I

00:19:58.960 | am, I'm within this chitchat area.

00:20:03.840 | And that is what we would get, ignoring these things

00:20:07.360 | that we don't need to go through.

00:20:09.440 | So we get chitchat.

00:20:11.600 | Then we can say something else.

00:20:13.440 | I'm interested in learning about LLAMA2.

00:20:16.400 | In this scenario, it's not really either of those.

00:20:18.640 | So we might have something that's kind of over here.

00:20:21.600 | We see, OK, we've got these.

00:20:23.800 | They're kind of their nearest neighbors.

00:20:27.960 | But none of the similarities here cross that threshold.

00:20:33.320 | So unfortunately-- well, not unfortunately.

00:20:36.640 | It's quite fortunate.

00:20:39.520 | We would return none for our categorization.

00:20:43.320 | And I mean, that high level, that

00:20:46.920 | is what semantic route is doing.

00:20:48.400 | But then you think, OK, I'm doing

00:20:51.240 | this for the inputs of some queries from a user.

00:20:57.400 | And OK, I can classify the user's input query.

00:21:02.920 | But what can I use that classification for?

00:21:05.040 | Do I need to use it in this way?

00:21:06.360 | No, you don't.

00:21:07.360 | You can use it in so many different ways,

00:21:13.040 | and probably many other ways I can't even think of.

00:21:15.120 | But let me just at a very high level mention some of those.

00:21:21.680 | So the first thing that I would like to mention

00:21:26.560 | is, if we go back to here, this first example I gave,

00:21:31.600 | someone's basically trying to hack the agent

00:21:36.040 | to get it to do something that we don't actually

00:21:38.840 | want it to do.

00:21:41.040 | So we have this structure.

00:21:43.120 | Now, what if we modify that structure a little bit?

00:21:46.760 | Now, the reason I say let's modify the structure

00:21:49.000 | a little bit is because there's a very obvious weakness

00:21:51.800 | with this, which is it's very hard to predict all the ways

00:21:55.320 | that a user is going to try and hack your system to say

00:21:57.960 | something that you don't want it to say.

00:21:59.920 | And actually, not all the time, but in many cases,

00:22:03.600 | it can be much easier to predict what the agent should not say

00:22:08.960 | or the LLM should not say.

00:22:11.520 | So you probably have a particular domain

00:22:16.080 | that you would like to talk about.

00:22:17.760 | You have particular things that you

00:22:19.520 | don't want your agent to say.

00:22:21.720 | I don't want to sell a car for some low price.

00:22:25.560 | I actually just don't want to tell it.

00:22:27.240 | I probably don't want to tell people, OK,

00:22:29.040 | you can buy this car for this price.

00:22:30.760 | It is just a bad idea.

00:22:33.800 | So what we should instead be focusing on here

00:22:37.920 | is not what the user is saying, but what our LLM is saying.

00:22:42.080 | So let's go back.

00:22:45.960 | Let's modify this.

00:22:47.640 | So we have our user's query at the top here.

00:22:50.080 | So there.

00:22:51.120 | The queries or the utterances that we're setting here,

00:22:55.720 | let's say rather than it being these questions that

00:23:00.960 | are user-focused, so some question from a user,

00:23:04.600 | rather than it being that, let's say

00:23:06.680 | these are statements or responses from our agent,

00:23:12.920 | ones that we don't want it to say.

00:23:14.800 | So for this query, we're going to say,

00:23:17.000 | yes, you can have this car for $1.

00:23:21.120 | We don't want that to happen.

00:23:23.800 | We can give another example.

00:23:25.400 | Yes, we can do 50% discount.

00:23:29.040 | This will be this query here.

00:23:30.760 | And then we can give many different examples.

00:23:33.480 | These other routes that we have here

00:23:35.720 | might actually be similar.

00:23:37.520 | So they might be, oh, we can finance your car or something.

00:23:44.280 | Similar, but slightly different.

00:23:46.000 | And that's why they would be in these different routes.

00:23:48.320 | But they still might be these sort of protective guardrails

00:23:53.160 | against giving out financial deals

00:23:56.240 | that you actually don't want it to give.

00:23:59.080 | So in this scenario, these routes here

00:24:02.320 | are actually basically outputs from your agent,

00:24:07.920 | not inputs from your user.

00:24:10.600 | So this user query comes in.

00:24:12.560 | And whereas before, we were passing it

00:24:14.520 | to the embedding model here, we're not going to do that.

00:24:18.080 | Instead, let's say the circle is our LLM.

00:24:22.360 | So our LLM is going to generate something.

00:24:25.280 | Actually, let's make our LLM a different color.

00:24:28.240 | Oh, no.

00:24:29.280 | OK, it's red.

00:24:31.000 | So our LLM is going to create some output.

00:24:36.720 | It's not good.

00:24:37.720 | So it's giving someone 75% discount.

00:24:43.120 | And we just don't want that.

00:24:44.800 | So what we're going to do now is we're going to take this.

00:24:48.560 | And we're going to put it through our embedding model.

00:24:51.400 | And we're going to come over here.

00:24:54.480 | And we put it in.

00:24:56.640 | It's probably-- OK, discount is around here.

00:25:00.400 | So we put it in there.

00:25:01.280 | We compare.

00:25:01.780 | We're like, oh, no, this is not very good.

00:25:03.520 | We've produced some bad output that we don't want.

00:25:07.600 | We just don't want this to happen.

00:25:09.240 | So there's different things you could do in this scenario.

00:25:13.680 | One thing is maybe you just want to have a pre-written response.

00:25:17.560 | This is the safest thing to do.

00:25:19.880 | So you just take from your wherever, your database,

00:25:23.640 | your code, whatever you have.

00:25:26.640 | So you have your pre-written response

00:25:28.180 | if this particular route is triggered.

00:25:31.320 | So we're going to say, ah, OK, we're not going to do that.

00:25:34.060 | We're going to come straight over here.

00:25:35.680 | We're going to go to our pre-written response.

00:25:37.560 | And we're going to return that to the user.

00:25:42.640 | Done.

00:25:44.200 | Easy.

00:25:45.440 | The other option, which is, I would say, more risky,

00:25:49.660 | but it's an option.

00:25:51.320 | You can do it, is you can come--

00:25:55.040 | OK, you can take what you've done here.

00:25:59.600 | You're going to come over here.

00:26:02.240 | You've got some additional prompting.

00:26:05.880 | Or maybe you completely swap out the LLMs system prompt.

00:26:09.760 | And you tell it, OK, you're not a sales person anymore.

00:26:13.040 | Now you're a protective defense against people

00:26:20.080 | trying to scam the company.

00:26:23.600 | But you stop them from scamming you in a nice way

00:26:27.760 | with polite language.

00:26:29.320 | You just modify something to basically put it

00:26:33.720 | into safety mode.

00:26:35.820 | Or you just add a warning here again.

00:26:39.480 | OK, don't-- something.

00:26:43.200 | You can also do that if you want.

00:26:45.120 | But in any case, whichever approach you go for,

00:26:48.680 | you're going to basically generate another output.

00:26:52.400 | Hopefully, this time, the output is good.

00:26:55.720 | If it's not, then again, maybe you just

00:26:57.320 | want to default to that backup query from over here.

00:27:03.760 | It's up to you.

00:27:04.600 | But the benefit of going with this route, where you are

00:27:07.840 | actually generating again, is obviously

00:27:11.000 | the agent or chatbot can still seem quite fluid.

00:27:15.960 | So it goes through here, it does everything.

00:27:18.040 | And we say, OK, this is passed.

00:27:20.160 | And now we have a much more fluid response

00:27:23.080 | to a user that isn't pre-written, which can--

00:27:26.400 | often in a chat, a pre-written thing is kind of weird.

00:27:32.440 | And it can really throw things off.

00:27:35.000 | So yeah, I mean, you have that.

00:27:38.760 | That is a different approach to doing this whole thing.

00:27:43.520 | And then just very quickly, there

00:27:46.960 | are so many use cases that we can talk about and talk

00:27:51.960 | about them in depth.

00:27:54.680 | I just want to give you a very high-level view

00:27:57.560 | of a couple of others that I think

00:27:59.400 | are quite interesting and cool.

00:28:01.240 | And focusing on chat, we can actually

00:28:05.840 | use Semantic Router in different modalities,

00:28:08.160 | like with images and stuff.

00:28:10.600 | A lot of different use cases there.

00:28:13.440 | But I just want to focus on the chat use case,

00:28:16.560 | or in particular, the language-focused or language

00:28:22.000 | modality chat use cases.

00:28:25.120 | So of course, we have the guardrails,

00:28:28.800 | which we've just spoken about.

00:28:31.560 | And we kind of touched upon not just using them as guardrails,

00:28:34.920 | but almost using them as behavioral modifications,

00:28:39.160 | where you are essentially, OK, you're

00:28:41.320 | going to modify the incoming query or the incoming user

00:28:46.680 | prompt, or user query, whatever.

00:28:51.800 | You modify that, or you modify the system prompt,

00:28:57.320 | or you add additional instructions

00:28:59.920 | to the system prompt, or you add in an additional system

00:29:02.720 | prompt in a different place.

00:29:05.480 | There's so many different things you can do there

00:29:08.440 | to modify basic behavior.

00:29:10.680 | Oh, another thing as well on that

00:29:12.240 | is actually kind of comes under behavioral modification,

00:29:17.600 | but I do want to point out as a use case, which is tool use.

00:29:22.600 | So if you trigger a particular route,

00:29:26.800 | you can, in several different ways,

00:29:29.240 | you can modify the tool use of your agent,

00:29:33.720 | assuming you have different tools.

00:29:35.800 | So that might be, in some scenarios,

00:29:38.240 | you might want to say, OK, we are

00:29:39.960 | going to select, let's say you have

00:29:45.320 | all of these different tools that your agent can use.

00:29:48.120 | Each one of these dots is a tool.

00:29:50.640 | And in one scenario where, let's say,

00:29:54.360 | the politics route is triggered, what you would do

00:29:57.360 | is you might want to filter your potential tool

00:30:01.400 | use to these specific tools, because you

00:30:03.200 | know that your agent in this scenario

00:30:06.240 | should be using one of those.

00:30:08.640 | It is very useful in many ways.

00:30:11.360 | One, it makes your prompts more concise.

00:30:14.600 | And by making your prompts more concise,

00:30:16.720 | you are saving on latency, you're saving on costs,

00:30:20.840 | and you're probably going to get better performance

00:30:24.360 | or accuracy out of your LLM, because you're

00:30:25.960 | giving it less choice.

00:30:26.880 | And it's always good to constrain the options

00:30:30.640 | that an LLM has.

00:30:32.200 | That will tend to lead to better accuracy and performance.

00:30:36.720 | OK, the other one is the other filter, which would

00:30:41.760 | be the chitchat filter.

00:30:43.680 | So in this scenario, let's say we

00:30:45.680 | want to filter it to these specific tools.

00:30:49.280 | Or in another scenario, let's say, actually, we

00:30:53.200 | know when this happens, when this query triggers,

00:30:58.640 | we actually know that we need to use this specific tool.

00:31:02.320 | So we actually force it to use that tool in that scenario.

00:31:06.200 | Another one might just be that, OK, we

00:31:08.280 | have that user query coming in, so on and so on.

00:31:13.040 | And we think that this user query would probably

00:31:16.480 | benefit from using a particular tool.

00:31:19.360 | So in that scenario, we would maybe just

00:31:22.400 | modify this user query.

00:31:24.240 | It doesn't need to be an alert, but some sort

00:31:28.400 | of little informational thing, like, hey, maybe

00:31:31.200 | you can use this.

00:31:33.040 | Or you can actually, like, programmatically just

00:31:36.760 | reduce the number of tools it has access to.

00:31:39.280 | So you're just going to be like, no, no, no, no, no.

00:31:42.400 | You can't use any of these.

00:31:43.520 | They don't exist.

00:31:44.640 | The only one you can use is this one.

00:31:47.240 | Maybe you can use both of those together.

00:31:50.400 | The things that you can do is--

00:31:52.240 | well, it's actually completely up to you.

00:31:53.860 | Because semantic router is just triggering something.

00:31:58.520 | What it triggers is kind of up to you.

00:32:01.960 | Now, these are all very agent LLM-specific things.

00:32:07.360 | But we can broaden this out to just our agentic workflow,

00:32:12.040 | or even beyond that.

00:32:13.800 | But let's focus on that.

00:32:15.680 | So let's broaden it out to, OK, in particular scenarios,

00:32:21.400 | you might want to use a different LLM

00:32:23.760 | in different places.

00:32:25.000 | Or you might want to use different system props.

00:32:27.120 | Or you might want to use different temperature settings.

00:32:31.080 | So we might get a user query that comes in.

00:32:34.160 | And it's like, OK, I want to write a story.

00:32:36.640 | I don't know.

00:32:40.280 | I want to write a story.

00:32:41.640 | That should be all you know.

00:32:43.800 | Obviously, they want to be a bit more creative.

00:32:46.880 | And I think the vast majority of people

00:32:48.760 | prefer using not really opening eyes models,

00:32:53.480 | more like Anthropic or other providers

00:32:56.040 | for creative writing.

00:32:58.480 | So what you could end up having here

00:33:01.080 | is like an LLM router, where based on particular queries,

00:33:05.200 | you're routing to the models that

00:33:07.240 | are better suited for this particular type of query

00:33:12.640 | from a user.

00:33:13.720 | So you might be going to different LLMs.

00:33:18.080 | Or also, you might just be using different system prompts.

00:33:21.520 | So the system prompts for helping someone out

00:33:24.320 | with their code is probably going

00:33:26.920 | to be fairly different to someone

00:33:31.640 | wanting advice on the ingredients

00:33:35.000 | to cook a nice meal, right?

00:33:38.040 | The different use cases, probably you're

00:33:40.040 | going to require different prompts.

00:33:42.680 | Of course, they don't even need to be that varied, right?

00:33:46.000 | You can have much more specific areas

00:33:49.200 | or more specific system prompts for very

00:33:52.760 | specific different things.

00:33:54.840 | And then, of course, as I mentioned,

00:33:56.400 | there's the temperature and other model settings.

00:33:59.960 | So in a creative writing example,

00:34:02.840 | someone asks you, they want to write a story.

00:34:05.000 | The temperature, you're just going to put that up

00:34:07.800 | to 0.9 or something.

00:34:10.280 | Whereas someone's asking, oh, I really

00:34:12.440 | need help figuring out, I have some code

00:34:16.520 | and I have this problem.

00:34:18.240 | What do I do?

00:34:20.120 | You're just going to turn that down like 0.1 or 0.0 or 0.01,

00:34:26.760 | whatever it actually is.

00:34:29.240 | So that's another-- focus on the conversational side of things,

00:34:33.760 | another thing we can do with it.

00:34:35.100 | And yeah, I mean, there are just so many of these use cases.

00:34:38.040 | I could go on forever.

00:34:38.960 | I'm not going to.

00:34:39.720 | I'm going to leave it there.

00:34:41.040 | So yeah, I just wanted to very quickly go

00:34:43.840 | through a few different--

00:34:45.440 | the concept of Semantic Router, in particular,

00:34:49.080 | with conversations.

00:34:50.440 | So yeah, I mean, I think I've done that.

00:34:52.840 | We're going to go into a lot more detail with many more

00:34:55.960 | of these in the near future.

00:34:59.360 | So for now, I will leave it there.

00:35:01.920 | So just thank you very much for watching.

00:35:04.880 | I hope this was useful and interesting.

00:35:07.600 | So I will see you again in the next one.

00:35:11.360 | Bye.

00:35:11.860 | [MUSIC PLAYING]

00:35:15.720 | [MUSIC PLAYING]

00:35:19.920 | [MUSIC PLAYING]

00:35:23.280 | (soft music)

00:35:25.700 | you

Better Chatbots with Semantic Routes

Chapters