back to indexSteerable AI with Pinecone + Semantic Router
Chapters
0:0 Pinecone and Semantic Router
1:53 Finding Code for Pinecone
4:12 Getting Routes from Hugging Face
7:36 Loading Route Layers from Pinecone
00:00:00.000 |
Today we're going to be taking a look at a new integration in the Semantic Router library, 00:00:07.420 |
Now naturally I'm very involved with both Semantic Router and Pinecone, so it seems 00:00:13.360 |
logical that those two will have come together at some point and now they have. 00:00:18.840 |
The purpose behind pulling these two technologies together is primarily that of scalability. 00:00:29.180 |
So with Pinecone, you obviously have huge potential scale in what you can do and that 00:00:38.520 |
The number of utterances and routes that you can sort in Semantic Router at the moment 00:00:44.720 |
is still pretty high because we're sorting everything locally, but with Pinecone you 00:00:49.880 |
can just go incredibly high scale, which is exciting for many reasons, but for me mostly 00:00:58.240 |
to see what sort of use cases people come up with. 00:01:01.300 |
Now for sure I think you can easily create tons of routes and tons of utterances and 00:01:08.420 |
get relatively high scale from that, but I'm sure there are many other use cases out there 00:01:12.620 |
as well that I have just not even thought about yet. 00:01:17.220 |
Now the other one is kind of like ease of use and persistence. 00:01:22.280 |
So with Pinecone, everything is within your Pinecone index, so you can then begin loading 00:01:29.760 |
up your route layers from Pinecone, which makes moving your route layers across different 00:01:38.280 |
places and just even from one session into another session much easier than when you're 00:01:53.480 |
So I'm going to go to the Semantic Router library. 00:01:56.280 |
We have the docs here, we have, well, there's a few places we can learn how to do this. 00:02:03.000 |
So I'm going to first, if you want to just like a very basic example, you can come into 00:02:09.000 |
So the indexes Pinecone example here, or the one we're going to walk through is this Pinecone 00:02:16.280 |
and scaling example, which just has a little bit more in it and a few more routes. 00:02:23.220 |
I need to add more to this, but it should be able to scale pretty high. 00:02:28.700 |
So I'm going to open the Colab notebook and here we are. 00:02:32.040 |
So as I mentioned in this small intro here, you could literally scale this to thousands 00:02:44.240 |
And I think for a lot of use cases that probably won't be necessary, but what I have noticed 00:02:50.660 |
is that Semantic Router can be used for a lot more than what I originally thought it 00:02:56.340 |
I've seen it be used for something that we've been building is a semantic splitter for more 00:03:03.600 |
intelligent chunking of documents and conversations. 00:03:07.660 |
And once you start chunking conversations, you can do kind of interesting things. 00:03:11.800 |
I've also, and then this is something I'll talk about very soon, seeing that we can also 00:03:21.540 |
So you can chunk video frames based on what is within the video, which is pretty interesting 00:03:29.800 |
And another thing that we're going to cover very soon is basically content moderation 00:03:37.360 |
So you can kind of, you can imagine the route that I might be going down there, but we're 00:03:45.280 |
And in those use cases, it might make sense sometimes to use larger scale, and I'm sure 00:03:53.640 |
there are many other use cases out there that I just haven't come across yet. 00:04:01.460 |
So we're using the HuggingFace datasets library, and we're also installing semantic router, 00:04:07.920 |
the local extras, and also the Pinecone extra. 00:04:14.640 |
So I'm going to download this dataset from HuggingFace datasets, and it's just a dataset 00:04:19.200 |
containing some routes that we're going to be using. 00:04:21.600 |
So I think there's something like a hundred and no, 150 utterances maybe. 00:04:29.240 |
So there's 50 routes that we have here and roughly, yes, three in each route. 00:04:37.460 |
So you can see here, we have one route, it's the ones that we've been seeing before. 00:04:42.360 |
And then basically generate a few more with GPT. 00:04:49.460 |
Now to generate or to take this dataset and convert those, like basically this into a 00:04:56.860 |
route, all we need is this, it's pretty, yeah, it's pretty straightforward. 00:05:02.900 |
So we now have like 50 routes, they look kind of like this, and we can go ahead and, well, 00:05:12.400 |
we need to initialize a route layer and to initialize a route layer, what we typically 00:05:23.160 |
We're going to use the local hugging face encoder. 00:05:27.820 |
And with that, we have both our encoder and routes, but we're also using a Pinecone index. 00:05:38.020 |
And we will also initialize our route layer with the Pinecone index. 00:05:45.020 |
And this is a new feature within the library as well. 00:05:49.380 |
You can also, if you want to, initialize it with the local index, but by default, it will 00:06:11.460 |
And we're going to go to API keys and just copy that. 00:06:19.640 |
So you will get this warning saying the index could not be initialized. 00:06:24.220 |
It's because we were initializing the index without any routes being attached. 00:06:29.060 |
It will be initialized correctly when we run this. 00:06:35.300 |
So because we're using Pinecone, it will take a moment for the index to be created. 00:06:39.740 |
And then the embeddings will need to be created and sent across the Pinecone, which actually 00:06:46.180 |
Let's very quick, let me double check if we come into here and just refresh. 00:06:57.660 |
So we can see down here, we have our semantic router index. 00:07:01.420 |
We can go in and it's just a 384 dimensional vectors because we're using a mini LM model 00:07:14.180 |
Very small amount, but good enough for just showing you how to use it. 00:07:21.540 |
So with that in place, whenever we call our route layer, it's actually going to be going 00:07:26.500 |
via Pinecone rather than the local index now. 00:07:32.860 |
And we can see, okay, that triggers the chitchat route. 00:07:36.900 |
Now, one of the benefits to Pinecone is that we have this persistent index, okay? 00:07:44.620 |
We have all that routes that are persisted within Pinecone. 00:07:48.600 |
And well, what we can do is just load everything up from Pinecone, wherever we are from a new 00:07:53.820 |
environment, and it will just work, which is pretty nice. 00:08:00.100 |
I'm going to go ahead and delete the route layer, the index and the routes that we created. 00:08:09.660 |
And then I'm going to initialize a new Pinecone index. 00:08:13.820 |
Now you can see I'm also passing in the index name here. 00:08:21.220 |
But I just want to show you how you can initialize it with custom index name if you prefer. 00:08:28.420 |
So I could call this like the Pinecone demo, for example. 00:08:35.340 |
But I've already created it, and it's called index, so I'm going to use that again. 00:08:41.340 |
So once I have done that and connected to my index, I can get the previous routes that 00:08:49.300 |
So I just run this, and that will get all of them. 00:08:54.340 |
So the format that it provides them back to us in is actually more of -- it's on more 00:09:01.740 |
So we need to convert -- you can see there's two utterances here for the single chit chat 00:09:08.220 |
So we need to convert this into a format that we can use to reinitialize our route layer. 00:09:15.740 |
So we're going to go ahead, we're going to create a routes dictionary, loop through this 00:09:21.700 |
and create, well, a ton of dictionary versions of these routes that we can then use to initialize 00:09:29.060 |
So I'm going to do that, and take a look at the routes dictionaries. 00:09:34.700 |
So you can see, if we go up to the top of it, you see here there's this route, cybersecurity, 00:09:43.860 |
We have chit chat here, right, and so on and so on. 00:09:52.340 |
So we're going to come to here and transform these into a list of route objects. 00:09:56.740 |
Okay, so I'm just iterating through the routes dictionary and pulling out the route name 00:10:03.660 |
and the utterances and mapping them to route objects. 00:10:07.420 |
Okay, so now we get, you know, the same again. 00:10:11.940 |
Now what we want to do is initialize our route layer. 00:10:15.980 |
So again, we're just, at this point, it's basically the same as what we did before. 00:10:21.980 |
So we have our route layer, we have an encoder, the new routes that we loaded from Pinecone, 00:10:26.700 |
and obviously our Pinecone index, and now we can test it again. 00:10:30.260 |
So we'll, well, you can see it already works here. 00:10:37.100 |
It correctly identifies this as a joke, this one as a joke, and this one as chit chat. 00:10:50.900 |
We tried to make it as simple as possible, to be honest. 00:10:54.580 |
But as I said, it unlocks a lot of scale use cases and the just ability to persist your 00:11:03.240 |
route layers, you know, wherever you are, which is a nice little thing to have. 00:11:14.780 |
So thank you very much for watching and I will see you again in the next one.