Back to Index

Faiss - Vector Compression with PQ and IVFPQ (in Python)


Chapters

0:0 Intro
0:24 Demonstration
1:15 Dataset
2:5 Initialization
4:15 Adding vectors
9:40 Memory Usage
11:45 Composite Index
14:30 Results

Transcript

Hi, welcome to this video. We're going to have a look at how we can implement a product quantization index or index PQ in FICE, and we're also going to have a look at how we can implement a composite index using an inverted file list, IVF, and also PQ together to improve our results or our performance even further.

Now, we'll just have a quick look at this. If you watched the previous video on the logic behind PQ, you'll recognize this, but this is just demonstrating the speed increase that we can get and also mainly the memory increase, which is the important factor in PQ. And obviously, there's a huge increase here.

So the speed increases is a five to six times increase with just PQ, and the memory decrease is huge. It's like a 97% reduction in memory. Now, I'm not going to go through how all of that works because we already did it in the last video. So there is a link to that video in the description.

So if you'd like to go through that, go ahead and go through it. But if you just want to see how this works in FICE, then continue and we'll jump straight into it. Okay, so the first thing that we need to do is actually get our data. So we're going to be using the SIFT1M dataset.

Again, in the description, there's a link to a notebook where you can download and also load that into your own notebook if you'd like to do that. But I should say it's a pretty straightforward, very popular dataset for these sort of examples I use all the time. And I'll just show you the sort of shape.

So XB, it is the vectors that we'll be indexing. So we're going to index these vectors in our index in FICE and search through them. Typically, it has 1 million vectors. I'm going to stick with 500K because it can just take a little bit of time for everything to load.

Although that being said, I don't think it's that bad. So let me, yeah, you know what? Let's just go with 1 million. It's more interesting. And then XQ, I'm just using a single vector here. So we just have one vector in there. As you see, dimensionality for all of our vectors here is 128, which is pretty typical, but it's definitely on the lower end for dense vectors.

So let's first initialize our PQ index. So we need to import FICE and we'll get our dimensionality, which is XB.shape1. Now I'm using this slightly different syntax to usual to align better with the PQ notation. Usually, you can go with lowcase D and you can go with that. It's fine.

It's not really an issue. I'm just going to go with uppercase for now to align better to that notation if you are following on from the previous video. And we're going to set M equals to 8, so that's how many subvectors we have. Now, typically, we assert that D is divisible by M, but we already know it is, so I'm going to skip that.

But that's something that you do need to make sure that D is divisible. In fact, you know what? I'm going to put it in anyway. So let's assert that D is divisible by M. Okay, just put this in because otherwise we are not splitting our vectors or subvectors equally, and we will not be allowed to do that.

So we do need to make sure that's always the case. And then we're going to set the nBits value. So this is the number of bits per subquantizer. So if you were watching previous videos, we would see this as, so the K star value is equal to 2 to the power of nBits.

Okay, so the number of centroids that we have within each subquantizer. And then we initialize our index. So index equals FICE, index PQ, D, M, and nBits. Okay, that's our index ready to go. And what we need to do now is actually add our vectors. Now, you have to think with product quantization, we are clustering.

Our subquantizers are using clustering. So we do actually need to train it. So we can see if an index needs to be trained or not by using this. So we write index is trained. In this case, is trained is false. So that means we need to train it before we add our vectors.

So to do that, we need to write index train, and then we pass it xB. We run that. And this can take a little bit of time. It shouldn't be too long. But if you start increasing the nBits value, which is probably a good idea. I mean, nBits of maybe 11 is actually what is recommended in the PQ paper.

But I'm not using it here because it takes a long time to train and build your vectors. So I'm avoiding it here. But if you're using this, I definitely recommend trying those larger nBits values. It's also worth noting that when we're using composite index of IVF and PQ, we can't use an nBits value of greater than 8.

But we'll move on to that anyway. That has to be something that they've hard-coded into FICE. For now, I think they should be removing that limitation at some point. So we have now trained, and now we can add our vectors. So now what we want to do is we'll return, say, the top k most similar vectors.

We'll go with k of 100. And then we want to do our search. So we're going to go distance. So typically, we'd write d here. But obviously, we're already using d. So I'm going to write distance equals index.search. And then in here, we get past our query. And we also pass k.

And that will return the k nearest neighbors in that search. So in here, if we write shape, we see that we have 100 of those. And then distances here are the actual distances. So i are the indices of the nearest neighbors. Distance is the actual distances that they have returned.

Now let's go ahead. And what I'm going to do is I'm going to time it. So we can use this time function to essentially run this cell multiple times and get the average time taken to run it, which is quite useful when we're comparing different indexes. And we'll do index.search.

And we want x, q, and k. Now we see that that took 3.34 milliseconds. So reasonably fast. Let's compare that to a flat index. So we'll do L2 index equals FICE index flat L2, d with a capital, L2 index dot add. And we want to add all of our data.

So it's b. And let's time that again. So I'm just going to copy this. Here. And we'll just replace that with L2 index. And let's see what we get. OK. So an average of 19.5 milliseconds. So already, this is pretty quick, right? Which is cool. Now we can check the performance of our index as well.

Now this varies from all the times I've tested it. But we should get something around maybe 50. So 50% recall isn't exactly cutting edge. But that's PQ. That is how it is. If we want high recall, we can increase n bits to get that. But it's definitely a drawback of using PQ.

OK. So we want to sum one value for every value that we find in our index. So in the return indexes for our PQ results, if they are in the L2 results. So we're just saying, you know, how many of the results returned by our PQ index were also returned by our L2 index, which we are seeing as the 100% recall.

So L2 index, 100% recall. And here we're seeing, OK, how many of those do we match to using our PQ index. If i in L2 i. So I actually need to just copy this, pull it down here. So when we use a time it, we can't get any variables out from it, which is, I don't know why, but it's a little bit annoying.

So we will get L2 this and L2 i. OK. So this time we get 38% recall. OK. So let's now have a look at the memory usage of each of these. So I'm just going to define a function, get to memory, so we can more easily compare those. And what we'll do is we'll just write our index to file, like so, index.

And I'll just call it temp index now, because we're going to delete it straight away afterwards. We're just writing it to file so we can read, see how large the index is. And then we delete it again. The file size equals OS path dot get size to get size of that index in bytes, I think.

So we'll do temp dot index. And then we don't really-- we're just using that to check the size, so we just want to delete that afterwards. So like so. And then we just return the file size. OK. So let's do that for both of our indexes. So we get the memory for the L2 index first.

Let me-- so that's remove, not remote. OK. So we get pretty big. That's half a gigabyte, which is massive. And then we want to get memory again for our index. Let's see what we have. So now we have eight megabytes. So that's a massive draw. That's eight over 512.

So it's a 98.4% reduction in size, which is huge, which is really cool. Now, we-- like I said before, we have this composite index where we have both the IVF step and a PQ step. So we're going to have a look at how we can implement that, and we'll see that memory decrease will be slightly less significant because it's a more complex index, although very small still.

And then the speed decrease will be absolutely insane. So we'll go into that. So now we have nlist, which is the number of cells within our IVF index. So if we just have a quick look at what that might look like. So imagine these are our PQ vectors on a 2D space.

Adding IVF essentially does this. So we add all these separating cells called Voronoi cells, and each of the vectors within each cell is assigned to the vector to that cell centroid. Now, here, you see the magenta dot at the bottom, that's our query vector, and it's landed within that cell that you see highlighted.

Now, because it's landed in there, we only compare it to those vectors within that cell that are also assigned to the same cell. And that's what we're doing with nlist. We're saying how many of those cells we would like to have. Now, we're going to go with 2,048, and it's worth noting that this must be greater than or equal to kthon, which is equal to 2 to the power of n bits.

So in this case, we could go with 256 to be fair. Let's go with that for now and we'll see how it goes. And then we initialize our index like this. So we go index equals FICE index IVFPQ. And in here, we need to pass our vectors. So in this case, we will just-- when I say vectors, I mean the index that contains our vectors.

So you typically see this written as quantizer, but I'm going to write it as vectors because I think it's a little bit less confusing. So we'll do FICE index flat L2. So we're just going to use these as our starting point. But that doesn't mean we're storing these as they are in our index.

They actually get processed using the PQ step as we already did before. So we have Vects, D, nlist, M, and nbits. OK, so we now have our index and we can see if it's trained like we did before. Obviously, it will not be because we are still using PQ again.

So we need to train that and we do the same as we did before. So train xp. After that, we add our vectors and we can get our results. So index.search xqk. OK, now we haven't seen how fast that is, so let's have a look. We'll just time it again.

OK, and although the actual search took longer, it's because we're looping through. So we had 10,000 loops for this test and we got this number, which is insanely small. So 0.001. No, that's not right. What are we on? This. 0.1 milliseconds. Sorry. So 0.1 milliseconds. What did we get before?

It was here. So this is using our flat index, 20 milliseconds. If we go up a little further, we see 3.3 milliseconds for that search up there. So yeah, super fast. And then if we check the recall, so we'll use the same as what we used before. So where is our sun here?

Like I said, this can go up and down. So at 24 this time, it's pretty, pretty bad. But I mean, like I said, PQ isn't the best for recall. But what we can do is actually, because we're using IVF, we can increase the number of cells that we search through.

So you can kind of see that happening here. So using the example from before, we solve an n-probe value of 1, which is the default. Increase it to 2, 3, 4, 5, and so on and so on. And we can increase that all the way up to search all of our n-probe values.

Of course, that is kind of a waste of time, because if you're searching through all of your cells, you're basically just doing a flat search with the additional overhead of having an IVF index. So it's even slower. We can use that information. So we know that if we set our index.n-probe equal to 2048, which is the number of cells that we-- the maximum number of cells that we set.

Or did we use 2, 5, 6? No, we used 2, 5, 6, so the endless value. So we can just put endless here. If we do that and we do a search again, xq and k, and we go dist i. Do that again. And let's come up here, get our recall.

So we see that the maximum recall we're going to get with this instance is 32%. Now, this is pretty low, so working on it before, I was getting 50%, 52%. It's also a case of there is some randomness in what you're going to output from your index. But of course, again, we can also increase n bits or do some other things.

So we can increase n bits, try decreasing m, and see how things work. But it's also worth noting with IVFPQ, the n bits value does have to be 8, at the moment at least. It's just that it's something that's hardcoded into FICE. Unless you want to go into the FICE code and then change that, you can remove it if you want.

Now, what we want to do is obviously not search through the entire index. So we're going to change n-probe. Let me also bring this down. We're going to change n-probe to something that is not quite as high, so maybe 20. We'll get 33, so that's high enough for us to get our maximum performance.

But obviously, it's going to be a lot quicker because we're searching through 20 cells rather than 256. Let's try 10, 32, so increase it a little bit to 12, still 32, 16, 33. So around this area here, so 14, 13. So 13 is our optimum n-probe value where we get the max recall out of that, the quickest time.

But of course, which parameters you use is completely up to you, and it depends on your use case as well. So that's it for this video. Thank you very much for watching. I hope it's been useful, and I will see you in the next one. Bye!