back to index

Steerable AI with Pinecone + Semantic Router


Chapters

0:0 Pinecone and Semantic Router
1:53 Finding Code for Pinecone
4:12 Getting Routes from Hugging Face
7:36 Loading Route Layers from Pinecone

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today we're going to be taking a look at a new integration in the Semantic Router library,
00:00:05.860 | which is with Pinecone.
00:00:07.420 | Now naturally I'm very involved with both Semantic Router and Pinecone, so it seems
00:00:13.360 | logical that those two will have come together at some point and now they have.
00:00:18.840 | The purpose behind pulling these two technologies together is primarily that of scalability.
00:00:29.180 | So with Pinecone, you obviously have huge potential scale in what you can do and that
00:00:35.520 | translates over to Semantic Router as well.
00:00:38.520 | The number of utterances and routes that you can sort in Semantic Router at the moment
00:00:44.720 | is still pretty high because we're sorting everything locally, but with Pinecone you
00:00:49.880 | can just go incredibly high scale, which is exciting for many reasons, but for me mostly
00:00:58.240 | to see what sort of use cases people come up with.
00:01:01.300 | Now for sure I think you can easily create tons of routes and tons of utterances and
00:01:08.420 | get relatively high scale from that, but I'm sure there are many other use cases out there
00:01:12.620 | as well that I have just not even thought about yet.
00:01:16.220 | So that is scale.
00:01:17.220 | Now the other one is kind of like ease of use and persistence.
00:01:22.280 | So with Pinecone, everything is within your Pinecone index, so you can then begin loading
00:01:29.760 | up your route layers from Pinecone, which makes moving your route layers across different
00:01:38.280 | places and just even from one session into another session much easier than when you're
00:01:44.920 | doing it locally with the local index.
00:01:48.360 | So it's also very exciting.
00:01:50.560 | Now let's jump straight into it.
00:01:53.480 | So I'm going to go to the Semantic Router library.
00:01:56.280 | We have the docs here, we have, well, there's a few places we can learn how to do this.
00:02:03.000 | So I'm going to first, if you want to just like a very basic example, you can come into
00:02:08.000 | here.
00:02:09.000 | So the indexes Pinecone example here, or the one we're going to walk through is this Pinecone
00:02:16.280 | and scaling example, which just has a little bit more in it and a few more routes.
00:02:20.920 | It's not, it isn't very high scale.
00:02:23.220 | I need to add more to this, but it should be able to scale pretty high.
00:02:28.700 | So I'm going to open the Colab notebook and here we are.
00:02:32.040 | So as I mentioned in this small intro here, you could literally scale this to thousands
00:02:39.720 | or even millions of routes if you wanted to.
00:02:44.240 | And I think for a lot of use cases that probably won't be necessary, but what I have noticed
00:02:50.660 | is that Semantic Router can be used for a lot more than what I originally thought it
00:02:54.800 | could be used for.
00:02:56.340 | I've seen it be used for something that we've been building is a semantic splitter for more
00:03:03.600 | intelligent chunking of documents and conversations.
00:03:07.660 | And once you start chunking conversations, you can do kind of interesting things.
00:03:11.800 | I've also, and then this is something I'll talk about very soon, seeing that we can also
00:03:17.260 | apply this splitting technique to video.
00:03:21.540 | So you can chunk video frames based on what is within the video, which is pretty interesting
00:03:27.240 | and exciting.
00:03:29.800 | And another thing that we're going to cover very soon is basically content moderation
00:03:36.360 | for images.
00:03:37.360 | So you can kind of, you can imagine the route that I might be going down there, but we're
00:03:42.880 | going to talk about that pretty soon.
00:03:45.280 | And in those use cases, it might make sense sometimes to use larger scale, and I'm sure
00:03:53.640 | there are many other use cases out there that I just haven't come across yet.
00:03:58.360 | So we're going to start here.
00:03:59.360 | We're going to install these.
00:04:01.460 | So we're using the HuggingFace datasets library, and we're also installing semantic router,
00:04:07.920 | the local extras, and also the Pinecone extra.
00:04:12.320 | Okay.
00:04:13.560 | And we're going to come down to here.
00:04:14.640 | So I'm going to download this dataset from HuggingFace datasets, and it's just a dataset
00:04:19.200 | containing some routes that we're going to be using.
00:04:21.600 | So I think there's something like a hundred and no, 150 utterances maybe.
00:04:28.240 | Okay.
00:04:29.240 | So there's 50 routes that we have here and roughly, yes, three in each route.
00:04:33.700 | So about 150 just over.
00:04:37.460 | So you can see here, we have one route, it's the ones that we've been seeing before.
00:04:42.360 | And then basically generate a few more with GPT.
00:04:46.960 | So there's a few routes in there.
00:04:49.460 | Now to generate or to take this dataset and convert those, like basically this into a
00:04:56.860 | route, all we need is this, it's pretty, yeah, it's pretty straightforward.
00:05:02.900 | So we now have like 50 routes, they look kind of like this, and we can go ahead and, well,
00:05:12.400 | we need to initialize a route layer and to initialize a route layer, what we typically
00:05:16.380 | need is a encoder and our routes.
00:05:20.920 | So we can get our encoder here.
00:05:23.160 | We're going to use the local hugging face encoder.
00:05:27.820 | And with that, we have both our encoder and routes, but we're also using a Pinecone index.
00:05:35.540 | So we also need to initialize that.
00:05:38.020 | And we will also initialize our route layer with the Pinecone index.
00:05:42.640 | So it's something slightly different here.
00:05:45.020 | And this is a new feature within the library as well.
00:05:49.380 | You can also, if you want to, initialize it with the local index, but by default, it will
00:05:56.260 | initialize with local index.
00:05:58.020 | So yeah.
00:05:59.900 | So we do need to get a Pinecone API key.
00:06:03.500 | This does need to use Pinecone serverless.
00:06:06.800 | So I'm going to go and get that.
00:06:08.680 | So you need to go to app.pinecone.io.
00:06:10.460 | Okay.
00:06:11.460 | And we're going to go to API keys and just copy that.
00:06:15.620 | And I'll enter in here.
00:06:18.300 | Cool.
00:06:19.640 | So you will get this warning saying the index could not be initialized.
00:06:23.220 | That's fine.
00:06:24.220 | It's because we were initializing the index without any routes being attached.
00:06:29.060 | It will be initialized correctly when we run this.
00:06:34.060 | Okay.
00:06:35.300 | So because we're using Pinecone, it will take a moment for the index to be created.
00:06:39.740 | And then the embeddings will need to be created and sent across the Pinecone, which actually
00:06:44.600 | did not take long at all.
00:06:46.180 | Let's very quick, let me double check if we come into here and just refresh.
00:06:55.060 | Cool.
00:06:57.660 | So we can see down here, we have our semantic router index.
00:07:01.420 | We can go in and it's just a 384 dimensional vectors because we're using a mini LM model
00:07:09.140 | with the hugging face encoder.
00:07:10.940 | And we only have 154 vectors here.
00:07:14.180 | Very small amount, but good enough for just showing you how to use it.
00:07:21.540 | So with that in place, whenever we call our route layer, it's actually going to be going
00:07:26.500 | via Pinecone rather than the local index now.
00:07:29.700 | So we can run this.
00:07:31.860 | Okay.
00:07:32.860 | And we can see, okay, that triggers the chitchat route.
00:07:36.900 | Now, one of the benefits to Pinecone is that we have this persistent index, okay?
00:07:44.620 | We have all that routes that are persisted within Pinecone.
00:07:48.600 | And well, what we can do is just load everything up from Pinecone, wherever we are from a new
00:07:53.820 | environment, and it will just work, which is pretty nice.
00:07:57.780 | So I'm going to go ahead and do that.
00:08:00.100 | I'm going to go ahead and delete the route layer, the index and the routes that we created.
00:08:05.980 | So that we can't cheat and use those.
00:08:09.660 | And then I'm going to initialize a new Pinecone index.
00:08:13.820 | Now you can see I'm also passing in the index name here.
00:08:16.900 | By default, the index name is this, right?
00:08:21.220 | But I just want to show you how you can initialize it with custom index name if you prefer.
00:08:26.060 | So you could put anything you want here.
00:08:28.420 | So I could call this like the Pinecone demo, for example.
00:08:32.220 | I think it would have to be like this.
00:08:35.340 | But I've already created it, and it's called index, so I'm going to use that again.
00:08:41.340 | So once I have done that and connected to my index, I can get the previous routes that
00:08:47.740 | are within the index already.
00:08:49.300 | So I just run this, and that will get all of them.
00:08:54.340 | So the format that it provides them back to us in is actually more of -- it's on more
00:08:59.580 | of an utterance level.
00:09:01.740 | So we need to convert -- you can see there's two utterances here for the single chit chat
00:09:07.180 | route.
00:09:08.220 | So we need to convert this into a format that we can use to reinitialize our route layer.
00:09:13.980 | And for that, we need routes.
00:09:15.740 | So we're going to go ahead, we're going to create a routes dictionary, loop through this
00:09:21.700 | and create, well, a ton of dictionary versions of these routes that we can then use to initialize
00:09:28.060 | a list of routes.
00:09:29.060 | So I'm going to do that, and take a look at the routes dictionaries.
00:09:34.700 | So you can see, if we go up to the top of it, you see here there's this route, cybersecurity,
00:09:42.860 | and then we have the utterances.
00:09:43.860 | We have chit chat here, right, and so on and so on.
00:09:49.340 | Cool.
00:09:50.340 | Jokes.
00:09:51.340 | We're going to use that in a moment.
00:09:52.340 | So we're going to come to here and transform these into a list of route objects.
00:09:56.740 | Okay, so I'm just iterating through the routes dictionary and pulling out the route name
00:10:03.660 | and the utterances and mapping them to route objects.
00:10:07.420 | Okay, so now we get, you know, the same again.
00:10:11.940 | Now what we want to do is initialize our route layer.
00:10:15.980 | So again, we're just, at this point, it's basically the same as what we did before.
00:10:20.220 | I'm showing that it actually works.
00:10:21.980 | So we have our route layer, we have an encoder, the new routes that we loaded from Pinecone,
00:10:26.700 | and obviously our Pinecone index, and now we can test it again.
00:10:30.260 | So we'll, well, you can see it already works here.
00:10:32.860 | I'll rerun them anyway.
00:10:35.380 | So yeah, you get these.
00:10:37.100 | It correctly identifies this as a joke, this one as a joke, and this one as chit chat.
00:10:43.060 | Okay.
00:10:44.060 | So yeah, I mean, that is it.
00:10:46.540 | It's very, yeah, very simple integration.
00:10:50.900 | We tried to make it as simple as possible, to be honest.
00:10:54.580 | But as I said, it unlocks a lot of scale use cases and the just ability to persist your
00:11:03.240 | route layers, you know, wherever you are, which is a nice little thing to have.
00:11:08.100 | So yeah, that's it for this video.
00:11:11.580 | I hope this has been useful and interesting.
00:11:14.780 | So thank you very much for watching and I will see you again in the next one.
00:11:19.100 | [Music]