back to index

NEW AI Framework - Steerable Chatbots with Semantic Router


Chapters

0:0 New Python Library for Better AI
1:57 Using Semantic Router
2:26 Semantic Router in Python
3:8 Defining Guardrails and Routes for LLMs
4:34 Initializing a RouteLayer
7:39 Using the Router with LangChain Agents
11:47 What else can Semantic Router do
12:40 Final Notes on the Library

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, I finally get to talk to you about something that I and others have been working
00:00:04.680 | on for a very long time.
00:00:06.660 | That something is one of the secrets to how I build good AI assistants, agents, and simply
00:00:17.520 | more controllable, deterministic dialogue with AI.
00:00:21.640 | That something is what we call a semantic router.
00:00:25.320 | Now a semantic router is something that you can think of as a almost fuzzy but deterministic
00:00:32.000 | layer on top of your chatbots or really anything that is processing natural language.
00:00:39.240 | The main purpose of the semantic router library is to act as a super fast decision making
00:00:45.360 | layer for LLMs.
00:00:47.600 | So rather than saying to an LLM, you know, which tool should I use as we do with agents,
00:00:52.460 | it takes a long time when you do that.
00:00:54.260 | With semantic router, it's like in almost instant.
00:00:58.200 | It's incredibly fast.
00:00:59.860 | And the way that we set it up is more deterministic because we can provide a list of queries that
00:01:05.680 | should trigger a particular response or a particular tool usage or anything we can imagine.
00:01:11.800 | And at the same time, that list of responses is represented within semantic vector space.
00:01:18.780 | So it's deterministic in that it will trigger if we hit one of those responses.
00:01:23.720 | We will also reach the responses around those queries or we call them utterances that we
00:01:29.880 | have defined.
00:01:30.880 | And I've been using this for chatbots and agents for this specific library for the past
00:01:36.040 | two months.
00:01:37.120 | And honestly, the thought for me of agents and chatbots being deployed without this layer
00:01:45.140 | to have more deterministic control over the chatbots, I think is a little bit crazy.
00:01:51.180 | And I just would not ever put a chatbot out there without having a semantic routing layer.
00:01:57.820 | So with all that in mind, let's have a look at how we actually use this library.
00:02:04.520 | So to get started with this semantic router library, we can first check out the repo.
00:02:11.040 | So it's AurelioLabs/semantic-router.
00:02:14.560 | And this gives you everything that you need to get started.
00:02:17.440 | We describe everything there.
00:02:20.040 | But if you really just want to jump straight into it, you can go to our introduction notebook
00:02:24.560 | here.
00:02:25.560 | I'm going to open it up in Colab.
00:02:27.280 | And we will find ourselves here.
00:02:29.040 | So to get started, we just pip install the library.
00:02:33.280 | So right now we're on 0.0.14, which is basically one of the earliest versions.
00:02:39.800 | There's a lot of cool things that we'll be adding soon.
00:02:43.240 | And it's also open source.
00:02:44.980 | So if people want to contribute, they can.
00:02:48.400 | Now, one thing that we have, particularly when using it with Google Colab at the moment,
00:02:54.600 | is that we'll have this annoying little thing that happens where we will need to restart
00:02:59.920 | after installing the prerequisites.
00:03:01.240 | Otherwise, we'll get this attribute error.
00:03:03.720 | So we just need to go restart session.
00:03:07.180 | And then we run again.
00:03:08.920 | And what I'm first going to do is define some routes that we're going to use and we're going
00:03:14.080 | to test against.
00:03:15.080 | So the first one of those is going to be a protective route.
00:03:17.760 | So this is where you would probably want to add some guardrails to your chatbots or agents.
00:03:27.380 | So maybe one of those would be you don't want it to begin talking about politics.
00:03:32.380 | So if a user asks a question that we would define as politics, we want it to trigger
00:03:38.000 | this route.
00:03:39.000 | And we can protect against that and we can return a specific predefined response or just
00:03:45.880 | remind the LLM to tell the user that you cannot talk about politics.
00:03:49.920 | So we define the politics route.
00:03:51.880 | And then we'll just find another one so we can see kind of how they interact.
00:03:55.760 | This one's going to be a general sort of chit chat, small talk route.
00:04:01.840 | How's the weather today?
00:04:02.840 | How are things going?
00:04:03.840 | So on and so on.
00:04:04.840 | Right?
00:04:05.840 | So what we want to do is initialize an embedding model.
00:04:09.140 | And you can either use Cohere or OpenAI.
00:04:12.720 | As I know many of you will be using OpenAI, we'll stick with OpenAI here.
00:04:17.280 | But I would actually recommend trying out Cohere's embedding models.
00:04:21.440 | They do work a little better in most use cases, at least that's what I've found.
00:04:27.440 | So for Cohere, you would go to dashboard.cohere.com.
00:04:30.560 | For OpenAI, we naturally go to platform.openai.com.
00:04:33.560 | We'll get an API key.
00:04:36.600 | And I'm just going to run this cell and it will pop up with a little input box to tell
00:04:41.120 | me to input my API key.
00:04:44.100 | So I'm going to do that.
00:04:45.840 | Now we're ready to initialize what's called a route layer.
00:04:50.680 | So a route layer is essentially a layer containing different routes.
00:04:56.000 | And it handles the decision making process as to whether we should go with one route
00:05:00.880 | or another route or no route.
00:05:03.000 | There is currently two route layers available in the library.
00:05:06.880 | The main one is the route layer.
00:05:09.020 | This is based on the idea of a pure semantic search.
00:05:14.080 | We also have the hybrid route layer that we're still working on, still improving.
00:05:18.680 | But that will allow us to use both semantic space and also a more term-based traditional
00:05:26.640 | vector space as well.
00:05:28.420 | So that might be particularly useful for specific terminology like in the medical domain, in
00:05:35.200 | the finance domain, and other places as well.
00:05:38.760 | For now, let's stick with the standard route layer and we can test it.
00:05:43.600 | So I'm going to run these three and let's see what we get.
00:05:48.520 | So don't you love politics?
00:05:52.000 | Our route choice, so this is the route that has been chosen, is the politics route.
00:05:56.920 | This function call, this is, that's related to our dynamic routes.
00:06:01.080 | We'll talk about that more in the future.
00:06:03.920 | Now this, what we have here with the function call equal to none is what we call a static
00:06:09.320 | route.
00:06:10.320 | How's the weather today?
00:06:11.320 | Okay.
00:06:12.320 | So that's our chitchat.
00:06:13.320 | That obviously triggers our chitchat.
00:06:15.340 | And then I'm interested in learning about Lama too.
00:06:17.440 | It's not really related to either of the routes that we've defined.
00:06:21.240 | So it returns none.
00:06:22.640 | Now let's go with something else.
00:06:26.320 | Maybe I want to ask about the agent's opinions on a particular political party.
00:06:31.720 | That's something that we don't want people doing in most cases.
00:06:36.440 | So I can say, okay, what do you think about the, in England we have the Labour Party,
00:06:46.760 | for example.
00:06:49.540 | So what do you think about the Labour Party?
00:06:52.220 | See what it says.
00:06:53.220 | Okay, cool.
00:06:54.400 | We have politics.
00:06:55.640 | So we trigger that route.
00:06:58.440 | And then what we do obviously is actually, let me get the route choice from that.
00:07:04.240 | So our route, what we would do is we just do some, if else logic.
00:07:10.160 | So if route name equals politics, we would return, hi, sorry, I can't talk about politics.
00:07:25.000 | Please go away or something along those lines, right?
00:07:30.040 | And then we're triggering that.
00:07:31.040 | And obviously we just have like this, if else logic that does different things.
00:07:36.220 | But then, okay, this is a very basic, this is the introduction.
00:07:40.040 | Let me just show you very quickly how we might integrate this with a line chain agent.
00:07:44.960 | So returning to our docs, we have this number three basic line chain agent.
00:07:49.200 | Of course, we also have these as well, if you want to check them out, but let's go to
00:07:53.240 | the line chain agent first.
00:07:54.700 | So I'm going to open again in Colab.
00:07:57.140 | So we have this notebook and I'll go through it in a little more detail later, but we have
00:08:03.160 | this system message.
00:08:04.520 | Okay.
00:08:05.520 | You're a helpful personal trainer, so on and so on.
00:08:08.520 | He's also acts like a noble British gentleman.
00:08:12.720 | And I had this little bit here.
00:08:15.400 | Remember to read the system notes provided with your user queries.
00:08:18.380 | This is where I'm going to be inputting the logic from our semantic router.
00:08:22.980 | There are many different ways of doing this.
00:08:24.640 | This is just one of the ways that I quite like, to be honest, it's almost like you add
00:08:29.160 | a suggestive layer to your agents based on the semantic router.
00:08:34.380 | So I've defined a agent here and I'm going to input my query.
00:08:38.640 | I want to know, should I buy optimum nutrition, whey, or my protein?
00:08:44.600 | Right?
00:08:45.600 | So I'm talking about whey protein and you can see up.
00:08:48.800 | Well, it depends, you prefer your way with a side of optimum nutrition or MP.
00:08:53.880 | So I don't think it even knows what, Oh, okay.
00:08:56.360 | It knows that they're brands.
00:08:58.520 | Cool.
00:08:59.520 | So that's good.
00:09:00.520 | But I'm, you know, I want to say assistant to act like a personal trainer that has their
00:09:04.600 | own brand and all these other things.
00:09:06.560 | So what I've done is I've created a semantic router, augmented query.
00:09:12.760 | What this has done, it's taken this query, process it through the semantic router.
00:09:16.240 | And then we've added this, uh, extra logic layer based on what the semantic router says,
00:09:21.400 | which adds different prompts to the user query based on that via the system note.
00:09:27.880 | So in this one, I've added one that talks about, okay, different types of proteins and
00:09:33.100 | products essentially.
00:09:34.800 | And what it does is it says, okay, remember you are not affiliated with any supplement
00:09:39.120 | brands.
00:09:40.120 | We have our own brand big AI that sells the best products like P 100 whey protein.
00:09:45.520 | I don't know if anyone will get that.
00:09:47.480 | And it's a stupid joke, but I liked it.
00:09:49.920 | So then the output becomes, why not try the big AI P 100 whey protein.
00:09:56.320 | It's the best, just like me, which is funny.
00:10:02.840 | So we have that.
00:10:05.600 | And then I have, I should show you the routes actually.
00:10:08.960 | So we have this get time route, which triggers a function supplement brand, which is the
00:10:14.240 | one we just saw, business acquiring product.
00:10:16.760 | And one of those, obviously it's the time route where it's getting the current time
00:10:19.720 | for you and putting into your query, right?
00:10:22.280 | So without the semantic router, we're just putting this query in.
00:10:27.360 | Okay.
00:10:28.360 | Then we go through our semantic router layer and it augments our query with this.
00:10:33.960 | So then if we go with just the plain query, put that in, we got, it's generally recommended
00:10:39.760 | to allow at least 48 hours of rest and so on and so on.
00:10:42.840 | It's not specific to the current time with this semantic router powered augmentation,
00:10:47.800 | we get this.
00:10:48.800 | Why not train again at exactly eight zero two tomorrow?
00:10:52.240 | That's the time that I asked this question, uh, but the day before that way you give your
00:10:56.880 | buddy a good rest, unless you're into those 24 hour gym life goals, which is a bit cringey.
00:11:02.720 | But anyway, so you see that through the semantic router, we're allowed, able to suggest to
00:11:08.200 | our agent to take or, or to get this additional information like we have done here, or to
00:11:13.720 | suggest to the agent to act in a particular way.
00:11:17.080 | And then we have these other ones, you know, I can, uh, do you do training sessions without
00:11:21.760 | the, without the augmentation, there's nothing relevant here.
00:11:26.480 | It's generally recommended, uh, actually, where is it?
00:11:29.120 | I'm here to provide guidance and support, not personal training sessions with the augmentation.
00:11:33.160 | Why, of course we offer these premium training sessions at just $700 per hour, which is what
00:11:39.960 | I told it to say.
00:11:41.600 | Now that's an example of what semantic router can do.
00:11:44.720 | It's just one example.
00:11:45.720 | There are many different things that you can do with this.
00:11:47.800 | What I've just shown you there is using these routes to remind the agent of particular information
00:11:52.200 | or to, you know, call a function, but we can also use it to protect against certain queries.
00:11:59.320 | We can use it to basically do function calling without the super slow agent processing time
00:12:05.760 | that function calling requires.
00:12:07.260 | And we can also use this, and this is one of the things that I use it for a lot as another
00:12:12.640 | approach to rag.
00:12:13.640 | You know, in the past I've spoken about this naive rag, which is where you're performing
00:12:17.360 | a search every query, you have the agent based rag, which is slower, but it can usually do
00:12:23.600 | a bit more.
00:12:24.600 | It's more powerful.
00:12:25.600 | We also have this, which is kind of like the semantic router rag or semantic rag, but it
00:12:30.800 | takes both.
00:12:31.960 | It can be very powerful like your agent, but it can also be very fast like your naive rag.
00:12:36.640 | So it really gets the best of both and it's generally my preferred way of doing it.
00:12:41.080 | So that is the semantic router.
00:12:45.240 | As I said, I and my team have been implementing this across many projects.
00:12:52.120 | So we, you know, we've been implementing it, seeing what works, seeing what doesn't work
00:12:56.240 | and fine tuning it based on that.
00:12:58.760 | And I think what we have here is the first version, okay, 100%, this is still a very
00:13:04.200 | early version, but it works incredibly well.
00:13:08.640 | It's truly getting us that final 20% of the AI behaviors that we need in order to make
00:13:15.840 | our agents something that we can actually go ahead and use in production, which is very
00:13:20.760 | cool to see.
00:13:22.040 | And we want other people to be able to use this as well, which is why you're seeing this
00:13:27.480 | right now.
00:13:28.480 | I personally, I'm very excited about releasing this.
00:13:31.480 | So I hope that this is exciting for at least a few of you.
00:13:36.120 | I hope some of you get to use it and, you know, please let me know what you think.
00:13:41.160 | If you are interested in contributing, it's all open source so you can, and I'll be doing
00:13:46.840 | a few more videos for sure on how we use this, how to make the most of the semantic router
00:13:53.600 | and especially the other features that I haven't spoken about yet, such as dynamic routing,
00:13:57.960 | the hybrid layer, those are all very exciting things and we have many more exciting things
00:14:03.940 | coming as well.
00:14:05.200 | So I hope all of this has been exciting and interesting, but for now I will leave it there.
00:14:11.340 | So thank you very much for watching and I will see you again in the next one.
00:14:16.720 | [Outro Music]
00:14:16.720 | [End]
00:14:26.720 | [End]