Back to Index

Faiss - Introduction to Similarity Search


Chapters

0:0 Introduction
3:21 Code Overview
6:15 Index Flat L2
8:10 Index Training
9:2 Adding Vectors
12:10 Query Time
13:0 Voronoi Cells
16:15 Coding
20:16 Index IVF
22:43 Product Quantization Index
25:54 Implementing Product Quantization Index
30:38 Comparison

Transcript

Hi, welcome to this video. We're going to be covering Facebook AI Similarity Search, or FICE. And we're going to be covering what FICE is and how we can actually begin using it. And we'll introduce a few of the key indexes that we can use. So just as a quick introduction to FICE, as you can probably tell from the name, it's a similarity search and it's a library that we can use from Facebook AI that allows us to compare vectors with a very high efficiency.

So, if you've seen any of my videos before on building sentence embeddings and comparing sentence embeddings, in those videos, I just did a generic Python loop to go through and compare each embedding. And that's very slow. Now, if you're only working with maybe 100 vectors, it's probably OK, you can deal with that.

But in reality, we're probably never going to be working with that smaller data set. Facebook AI Similarity Search can scale to tens, hundreds of thousands or up to millions and even billions. So this is incredibly good for efficient similarity search. But before we get into it, I'll just sort of visualize what this index looks like.

So if we imagine that we have all of the vectors that we have created and we put it into our similar search index, now they could look like this. So this is only a three dimensional space. But in reality, there would be hundreds of dimensions here. In our use case, we're going to be using dimensions of 768.

So, you know, there's a fair bit in there. Now, when we search, we would introduce a new vector into here. So let's say here, this is our query vector, so X, Q. Now, if we were comparing every item here, we would have to calculate the distance between every single item.

So we would calculate for between our query vector and every other vector that is already in there in order to find the vectors which are closest to it. Now, we can optimize this. We can improve, we can decrease the number of dimensions in each of our vectors and do it in a intelligent way so they take up less space and the calculations are faster.

And we can also restrict our search. So in this case, rather than comparing every single item, we might restrict our search to just this area here. And these are a few of the optimizations at a very high level that we can do with FICE. So that's enough for the introduction to FICE.

Let's actually jump straight into the code. Okay, so this is our code. In here, this is how we are loading in all of our sentence embedding. So I've gone ahead and processed them already 'cause they do take a little bit of time to actually build, but we're building them from this file here.

We'll load this into Python as well, but I mean, it's pretty straightforward. It's just a load of sentences that have been separated by a newline character. And then in here, we have all of those NumPy binary files. Now, there's NumPy binary files. Like I said, we're getting them from GitHub, which are over here.

That's where we're pulling them all in using this cell here. Now, that saves everything to file. And then we just read in each of those files and we append them all into a single NumPy array here. And that gives us these 14.5 thousand samples. Each embedding is a vector with 768 values inside.

So that's how we're loading in our data. I'll also load in that text file as well. So we just want to do with open sentences.text. And then we're just reading that in as a normal file. And we just write, I'm gonna put lines equals fp.read. And like I said, we're splitting that by newline characters.

So we just write that. Sorry, it's sentences. And we see a few of those as well. Okay. Now, to convert from those sentences into those sentence embeddings, I need to import this anyway for later on when we're building our query vectors. I'll just show you how I do that now.

All we do is from sentence transformers, which is the library we're using to create those embeddings, import sentence transformer. And then our model, we're using sentence transformer again. And we're using the BERT and BASE NLI mean tokens model. Okay. So that's how we initialize our model. And then when we're encoding our text, we'll see in a moment, we just write model encode, and then we write something in here, hello world.

Okay, and that will encode, that will give us a sentence embedding. Okay. So that is what we have inside here. We just have the sentence embeddings of all of our lines here. Now, I think we have everything we need to get started. So let's build our first FICE index.

So the first one we're gonna build is called the index flat L2. And this is a flat index, which means that all the vectors are just flat vectors. We're not modifying them in any way. And the L2 stands for the distance metric that we're using to measure the similarity of each vector or the proximity of each vector.

And L2 is just Euclidean distance. So it's a pretty straightforward function. Now, to initialize that, we just write FICE. So we imported, no, so we need to import FICE. And then we write index equals FICE.index flat L2. And then in here, we need to pass the dimensionality of our vectors or our sentence embeddings.

Now, what is our dimensionality? So each one is 768 values long. So if we'd like a nicer way of writing out, we put sentence embeddings and we write shape one. And our index requires that in order to be properly initialized. So do that. That will be initialized. Let me run it again.

The, I think my notebook just restarted. It did restart, it's weird. Okay, one minute. So that's going to initialize the index. And there is one thing that we need to be aware of. So sometimes with these indexes, we will need to train them. So if the index is going to do any clustering, we will need to train that clustering algorithm on our data.

And now in this case, we can check if an index needs training or is trained already using the is trained attribute. And we'll see with this index, because it's just a flat L2 index, it's not doing anything special. We'll see, because it's not doing anything special, we don't need to train it.

And we can see that when we write is trained, it says it's already trained, just means that we don't actually need to train it. So that's good. Now, how do we add our vectors, our sentence embeddings? All we need to do is write index, add, and then we just add embeddings like so.

So pretty straightforward. So add sentence embeddings. And then from there, we can check that they've been added properly by looking at the end total value. So this is number of embeddings or vectors that we have in our index. And with that, we can go ahead and start querying. So let's first create a query.

So we'll do XQ, which is our query vector. And we want to do the model and code that we did before. Now, I'm going to write someone sprints with a football. Okay. That's going to be our query vector. And to search, we do this. So we write DI equals index, search, XQ.

And then in here, we need to add K as well. So K, let me define it above here. So K is the number of items or vectors, similar vectors that we'd like to return. So I'm going to want to return four. So with here, with this, we will return four index IDs into this I variable here.

I'm going to time it as well, just so you see how long it takes. And let's print I. You can see that we get these four items. Now, these align to our lines. So the text that we have up here, that will align. So what we can do is we can print all of those out.

So let's do I. And then in here, we want to write lines I for I. Sorry, let me end that. I for I in I. Okay. Ah, sorry. So this is zero here. Okay, so these are the sentences or the similar sentences that we got back. And we see, obviously, it seems to be working pretty well.

All of them talking about football or being on a football field. So that looks pretty good, right? The only problem is that this takes a long time. We don't have that many vectors in there. And it took 57.4 milliseconds. So it's a little bit long and something that we can actually improve.

Okay, so before we move on to the next index, I just want to have a look at the sort of speed that we would expect from this when we are, this is a very small data set. So what else could we expect? So if we go over here, I've already written all this code.

If you'd like to go through this notebook, I'll leave a link in the description. So come down here, we have this flat L2 index, and this is the query time. So this is for a randomly generated vector with a dimension size of 100. And this is a number of vectors within that index.

So we go up to 1 million here. And this is a query time in milliseconds. You can see, you know, it increases quite quickly. Now this is in FICE, but it's still an exhaustive search. We're not really optimizing how we could do. We're not using that approximate search capabilities of FICE.

So if we switch back over to FICE, we can begin using that approximate search by adding partitioning into our index. Now, the most popular of these uses a technique very similar to something called Voronoi cells. I'm not sure how you pronounce it. I think that's about right. And I can show you what that looks like.

So over here, if we go here, we have all of these. So this is called a Voronoi diagram. And each of the sort of squares or the cells that you see are called Voronoi cells. So here we have Voronoi cells. And that is just what you see here. So this, this, all of these kind of squares are each a cell.

Now, as well as those, we also have our centroids. So I'm just gonna write this out instead. So centroids. And these are simply the centers of those cells. Now, when we introduce a new vector or our query vector into this, what we're doing is essentially, so we have our query vector, and let's say it appears here.

Now, within each one of these cells, we actually have a lot of other vectors. So we could have, you know, we could have millions in each cell. So there's a lot in there. And if we were to compare that query vector, this thing here, to every single one of those vectors, it would obviously take a long time.

We're going through every single one. We don't want to do that. So what this approach allows us to do is instead of checking against every one of those vectors, we just check it against every centroid. And once we figure out which centroid is the closest, we limit our search scope to only vectors that are within that centroid Voronoi cell.

So in this case, it would probably be this centroid here, which is the closest. And then we would just limit our search to only be within these boundaries. Now, what we might find is maybe there's, the closest vector here is actually here, whereas the closest vector here is right there.

So in reality, this vector here, this one, might actually be a better approximation or a better, it might be more similar to our query. And that's why this is approximate search, not exhaustive search, because we might miss out on something, but that is kind of outweighed by the fact that this is just a lot, a lot faster.

So it's sort of pros and cons. It's whatever is going to work best for your use case. Now, if we want to implement that in code, first thing that we want to do is define how many of those cells we would like. So I'm going to go 50. So use this endless parameter.

And then from there, we can set up our quantizer, which is almost, it's like another step in the process. So with our index, we are still going to be measuring the L2 distance. So we still actually need that index in there. So to do that, we need to write FICE index flat L2, and we pass out dimensions again, just like we did before.

And like I said, that's just a step in the process. That's not our full index. Our full index is going to look like this. So we write index. And in here, we're going to have our FICE, and this is a new index. So this is the one that is creating those partitions.

So we write index IVF flat. And in there, we need to pass our quantizer, the dimensions, and also the end list. Okay, now, if you remember what I said before, we, in some cases, we'll need to train our index. Now, this is an example of one of those times.

Because we're doing the clustering and creating those foreign noise cells, we do need to train it. And we can see that because this is false. Now, to train it, we need to just write index train, and then in here, we want to pass all of our sentence embeddings. So sentence embeddings, like so.

Let's run that. It's very quick. And then we can write, it's trained. And we see that's true. So now our index is essentially ready to receive our data. So we do this exactly the same way as we did before. We write index add, and we pass our sentence embeddings again.

And we can check that everything is in there with index and total. Okay, so now we see that we have our index, it's ready, and we can begin querying it. So what I'm going to do is use the exact same query vector that we used before. Going to time it so that we can see how quick this is compared to our previous query.

And we're actually going to write the exact same thing we wrote before. So can I actually just copy it? So I'll take that, bring it here. There we go. So now let's have a look. So total 7.22. So bring it up here, and we have 57.4. Now this is maybe a little bit slow.

So we'll see that the times do vary a little bit quite randomly, but maybe that's a little bit slow, but it's probably pretty realistic. So that took 57 milliseconds. This one, seven. Now let's have a look. So these are the indexes we've got. Let's compare them to what we had before.

And I believe they're all the same. So we've just shortened the time by a lot and we're getting the exact same results. So that's pretty good. Now, sometimes we will find that we do get different results. And a lot of the time that's fine, but maybe if you find the results are not that great when you add this sort of index, then that just means that this search is not exhaustive enough.

Like we are using approximate search, but maybe we should approximate a little bit less and be slightly more exhaustive. And we can do that by setting the nProbe value. So nProbe, let's explain in a minute. So let me actually first just run this and we can see it will probably take slightly longer.

So yeah, we get 15 milliseconds here. Of course, we get the same results again 'cause there were no accuracy issues here anyway. But let me just explain what that is actually doing. So in this case here, what you can see is a IVF search where we are using an nProbe value of one.

So we're just searching one cell based on the first nearest centroid to our query vector. Now, if we increase this up to eight, or let's use a smaller number in this example. So maybe we increase it to four, our four nearest centroids. So I would say probably these, this one, this one, this one, and the one we've already highlighted.

All of those would now be in scope because our nProbe value, so the number of cells that we are going to search is four. Now, if we increase it again to say six, these two cells might also be included. Now, of course, when we do that, we are searching more, so we might get a better performance, better accuracy.

But in terms of performance in time, it's also not, it's also going to increase and we don't want time to increase. So there's a trade-off between those two. In our case, we don't really need to increase this, so don't really need to worry about it. So that is the index IVF.

And we have one more that I want to look at, and that is the product quantization index. So this is actually, so we use IVF, and then we also use product quantization. So it's probably better if I try and draw this out. So when we use product quantization, imagine we have one vector here.

So this is our vector. Now, the first step in product quantization is to split this into sub-vectors. So we split this into several and then we take them out. We pull these out and they are now their own sort of mini-vectors. And this is just one vector that I'm visualizing here, but we would obviously do this with many, many vectors.

So there would be many, many more. So in our case, that's just under 15,000. Now, that means that we have a lot of these sub-vectors. And what we do with these is we run them through their own clustering algorithm. So what we do is we end up getting clusters and each of those clusters is going to have a centroid.

So this one would also be run through one. So each subset of vector slices is going to be run through its own clustering algorithm creating these centroids. And these centroids are smaller in size than the original sub-vectors here. And what we do is for each of these sub-vectors, so each of these sub-vectors, they get pulled into here.

So maybe this one is here and it gets assigned to its nearest centroid. And then we take that assignment all the way back over here and add it into our vector. So this is centroid three. For example, and when I say assign it back, it's probably the wrong way to think about it.

Maybe it's more like this. So it becomes a new vector built from those centroid IDs. Okay, so this would be three. Now, what that does is essentially reduces the size of our vectors, but pretty significantly, depending on what dimensions we use there. So, so you're back to the code.

Let's implement that. Now we need to define two new variables here. So M, which is going to be the number of centroids in the final vector. So now one thing that we do need to know with M is that M must be, we must be able to multiply M into D.

So what is our D value? It's 100, I can't remember. Where are we? Let me check. So 768. Now we should be able to divide that into eight, I think. Yeah, so this is good. We can use eight for M, but we couldn't use something like five because if we do five, we see that D doesn't fit.

So five doesn't fit nicely into D, whereas eight does. So M or D must be a multiple of M. Otherwise we're going to get an error. And that's because of the way that those vectors are broken down into the final centroid ID vectors. And we also need to specify the number of bits within each of those centroids.

So this value, we can use what we want. I'm going to use eight. And then we can set up our index and also the quantizer. So we use a quantizer as we did before. So the quantizer is going to be FICE.index Flat L2 D. And also our index here is going to be, so this is a new one.

This is index IVFPQ. So it's not a flat vector anymore, which is the full vector. It's a quantized vector. So where we have reduced the size of it through this, through the method I explained before, where we drew out. Now we need to pass a few arguments into here.

First one is the quantizer. So the quantizer D, which is our own dimensionality, and list M and bits. So pass all of those to our index. Sorry, we need to put FICE there as well. And there we go. So we now have our index again. You may have guessed that we might need to train this one.

There we go. So to train it, we'll just write index.train. That's our sentence embeddings. Okay, so it might take a little bit longer this time. There we go. And then we can add our vectors. Now, after adding those, let's see how quick this is. Should be a lot quicker, or a fair bit quicker.

It's hard to get much quicker than the last one. So we're going to use the same code as before. So I'm going to take this right down here. See, 2.86. So we've gotten a lot faster. So we've gone from, what's up here? 57 milliseconds down to two. Now, there is one thing here.

These values are now different. So the accuracy has decreased. So if we, where is the last one here? So you can see that we are getting, so we have the 190, we still have that one, and we have the 12.465, but these two at the front are now different.

And this is just, it's one of the trade-offs of accuracy versus speed. So if we come down here, let's give that a go. Let's have a look at what we are pulling through. So I'll copy this again. Let's just see. So we have these. I mean, although the accuracy has decreased technically, because it's not getting the same results as the exhaustive search, they're still pretty good results.

So I mean, nonetheless, I think that is pretty cool. So let's have a look at, let's compare this to our previous two other methods in terms of, as we did before, the graphs. So here is that final one. So we have IVF PQ along the bottom. Yeah, it's a lot faster, right?

And then we have IVF flat with a end period value of 10, much faster than L2, but still not quite as fast as PQ. And then we have flat L2 at the top, which obviously. And just as well, just be aware, on the left here, we have a large scale.

So the differences are pretty significant when we go to the 1 million mark. So I think that's it for this video. So I think obviously Pfizer's is pretty cool, definitely really useful, and I think we're definitely going to explore it more in the future. So for now, that's it.

So thank you for watching, and I'll see you in the next one.