back to indexFaiss - Vector Compression with PQ and IVFPQ (in Python)
Chapters
0:0 Intro
0:24 Demonstration
1:15 Dataset
2:5 Initialization
4:15 Adding vectors
9:40 Memory Usage
11:45 Composite Index
14:30 Results
00:00:00.000 |
Hi, welcome to this video. We're going to have a look at how we can implement a product quantization 00:00:06.800 |
index or index PQ in FICE, and we're also going to have a look at how we can implement a composite 00:00:13.920 |
index using an inverted file list, IVF, and also PQ together to improve our results or our 00:00:23.200 |
performance even further. Now, we'll just have a quick look at this. If you watched the previous 00:00:28.240 |
video on the logic behind PQ, you'll recognize this, but this is just demonstrating the speed 00:00:33.440 |
increase that we can get and also mainly the memory increase, which is the important factor in 00:00:39.680 |
PQ. And obviously, there's a huge increase here. So the speed increases is a five to six times 00:00:46.960 |
increase with just PQ, and the memory decrease is huge. It's like a 97% reduction in memory. 00:00:55.040 |
Now, I'm not going to go through how all of that works because we already did it in the last video. 00:01:01.200 |
So there is a link to that video in the description. So if you'd like to go through that, 00:01:07.200 |
go ahead and go through it. But if you just want to see how this works in FICE, 00:01:11.200 |
then continue and we'll jump straight into it. Okay, so the first thing that we need to do is 00:01:19.520 |
actually get our data. So we're going to be using the SIFT1M dataset. Again, in the description, 00:01:25.840 |
there's a link to a notebook where you can download and also load that into your own notebook 00:01:32.080 |
if you'd like to do that. But I should say it's a pretty straightforward, very popular 00:01:38.320 |
dataset for these sort of examples I use all the time. And I'll just show you the sort of shape. 00:01:45.040 |
So XB, it is the vectors that we'll be indexing. So we're going to index these vectors in our index 00:01:53.760 |
in FICE and search through them. Typically, it has 1 million vectors. I'm going to stick with 00:02:00.240 |
500K because it can just take a little bit of time for everything to load. 00:02:03.680 |
Although that being said, I don't think it's that bad. So let me, yeah, you know what? Let's just 00:02:11.200 |
go with 1 million. It's more interesting. And then XQ, I'm just using a single vector here. 00:02:19.760 |
So we just have one vector in there. As you see, dimensionality for all of our vectors here is 128, 00:02:25.280 |
which is pretty typical, but it's definitely on the lower end for dense vectors. So let's first 00:02:32.320 |
initialize our PQ index. So we need to import FICE and we'll get our dimensionality, which is 00:02:39.760 |
XB.shape1. Now I'm using this slightly different syntax to usual to align better with the PQ 00:02:50.320 |
notation. Usually, you can go with lowcase D and you can go with that. It's fine. It's not really 00:02:55.200 |
an issue. I'm just going to go with uppercase for now to align better to that notation if you are 00:03:01.520 |
following on from the previous video. And we're going to set M equals to 8, so that's how many 00:03:06.320 |
subvectors we have. Now, typically, we assert that D is divisible by M, but we already know it is, 00:03:13.200 |
so I'm going to skip that. But that's something that you do need to make sure that D is divisible. 00:03:20.240 |
In fact, you know what? I'm going to put it in anyway. So let's assert that D is divisible by M. 00:03:27.040 |
Okay, just put this in because otherwise we are not splitting our vectors or subvectors equally, 00:03:33.520 |
and we will not be allowed to do that. So we do need to make sure that's always the case. 00:03:37.840 |
And then we're going to set the nBits value. So this is the number of bits per subquantizer. So 00:03:45.840 |
if you were watching previous videos, we would see this as, so the K star value is equal to 2 00:03:56.240 |
to the power of nBits. Okay, so the number of centroids that we have within each subquantizer. 00:04:03.920 |
And then we initialize our index. So index equals FICE, index PQ, D, M, and nBits. 00:04:13.040 |
Okay, that's our index ready to go. And what we need to do now is actually add our vectors. Now, 00:04:23.520 |
you have to think with product quantization, we are clustering. Our subquantizers are using 00:04:30.480 |
clustering. So we do actually need to train it. So we can see if an index needs to be trained or not 00:04:37.840 |
by using this. So we write index is trained. In this case, is trained is false. So that means we 00:04:42.880 |
need to train it before we add our vectors. So to do that, we need to write index train, 00:04:51.120 |
and then we pass it xB. We run that. And this can take a little bit of time. It shouldn't be too 00:04:56.080 |
long. But if you start increasing the nBits value, which is probably a good idea. I mean, nBits of 00:05:06.160 |
maybe 11 is actually what is recommended in the PQ paper. But I'm not using it here because it 00:05:14.480 |
takes a long time to train and build your vectors. So I'm avoiding it here. But if you're using this, 00:05:24.400 |
I definitely recommend trying those larger nBits values. It's also worth noting that 00:05:32.640 |
when we're using composite index of IVF and PQ, we can't use an nBits value of greater than 00:05:40.080 |
8. But we'll move on to that anyway. That has to be something that they've hard-coded into FICE. 00:05:47.280 |
For now, I think they should be removing that limitation at some point. So we have now trained, 00:05:54.320 |
and now we can add our vectors. So now what we want to do is we'll return, say, the top 00:06:03.280 |
k most similar vectors. We'll go with k of 100. And then we want to do our search. 00:06:10.320 |
So we're going to go distance. So typically, we'd write d here. But obviously, we're already using 00:06:17.040 |
d. So I'm going to write distance equals index.search. And then in here, we get past our 00:06:24.400 |
query. And we also pass k. And that will return the k nearest neighbors in that search. So in here, 00:06:35.120 |
if we write shape, we see that we have 100 of those. And then distances here are the actual 00:06:41.120 |
distances. So i are the indices of the nearest neighbors. Distance is the actual distances that 00:06:47.840 |
they have returned. Now let's go ahead. And what I'm going to do is I'm going to time it. So we 00:06:54.240 |
can use this time function to essentially run this cell multiple times and get the average time 00:07:02.240 |
taken to run it, which is quite useful when we're comparing different indexes. 00:07:07.440 |
And we'll do index.search. And we want x, q, and k. Now we see that that took 3.34 milliseconds. 00:07:21.440 |
So reasonably fast. Let's compare that to a flat index. So we'll do L2 index equals FICE 00:07:29.760 |
index flat L2, d with a capital, L2 index dot add. And we want to add all of our data. So it's b. 00:07:44.880 |
And let's time that again. So I'm just going to copy this. 00:07:50.560 |
Here. And we'll just replace that with L2 index. And let's see what we get. OK. So an average of 00:07:58.320 |
19.5 milliseconds. So already, this is pretty quick, right? Which is cool. Now we can check 00:08:06.720 |
the performance of our index as well. Now this varies from all the times I've tested it. But we 00:08:13.840 |
should get something around maybe 50. So 50% recall isn't exactly cutting edge. But that's PQ. 00:08:22.480 |
That is how it is. If we want high recall, we can increase n bits to get that. But it's definitely 00:08:29.920 |
a drawback of using PQ. OK. So we want to sum one value for every value that we find in our index. 00:08:41.920 |
So in the return indexes for our PQ results, if they are in the L2 results. So we're just saying, 00:08:51.840 |
you know, how many of the results returned by our PQ index were also returned by our L2 index, 00:08:57.680 |
which we are seeing as the 100% recall. So L2 index, 100% recall. And here we're seeing, OK, 00:09:06.080 |
how many of those do we match to using our PQ index. If i in L2 i. So I actually need to just 00:09:16.240 |
copy this, pull it down here. So when we use a time it, we can't get any variables out from it, 00:09:24.160 |
which is, I don't know why, but it's a little bit annoying. So we will get L2 this and L2 i. 00:09:34.560 |
OK. So this time we get 38% recall. OK. So let's now have a look at the memory usage 00:09:47.520 |
of each of these. So I'm just going to define a function, get to memory, 00:09:51.120 |
so we can more easily compare those. And what we'll do is we'll just write our index to file, 00:10:00.080 |
like so, index. And I'll just call it temp index now, because we're going to delete it straight 00:10:06.640 |
away afterwards. We're just writing it to file so we can read, see how large the index is. 00:10:12.480 |
And then we delete it again. The file size equals OS path dot get size to get size of that index in 00:10:23.120 |
bytes, I think. So we'll do temp dot index. And then we don't really-- we're just using that to 00:10:31.680 |
check the size, so we just want to delete that afterwards. So like so. And then we just return 00:10:40.720 |
the file size. OK. So let's do that for both of our indexes. So we get the memory for the L2 index 00:10:52.880 |
first. Let me-- so that's remove, not remote. OK. So we get pretty big. That's half a gigabyte, 00:11:08.800 |
which is massive. And then we want to get memory again for our index. Let's see what we have. 00:11:18.960 |
So now we have eight megabytes. So that's a massive draw. That's eight over 512. 00:11:32.800 |
So it's a 98.4% reduction in size, which is huge, which is really cool. 00:11:47.680 |
Now, we-- like I said before, we have this composite index where we have both the IVF step 00:11:52.480 |
and a PQ step. So we're going to have a look at how we can implement that, and we'll see 00:11:56.880 |
that memory decrease will be slightly less significant because it's a more complex index, 00:12:03.200 |
although very small still. And then the speed decrease will be absolutely insane. So we'll go 00:12:10.720 |
into that. So now we have nlist, which is the number of cells within our IVF index. 00:12:22.320 |
So if we just have a quick look at what that might look like. So imagine these are our PQ vectors 00:12:28.320 |
on a 2D space. Adding IVF essentially does this. So we add all these separating cells called 00:12:36.640 |
Voronoi cells, and each of the vectors within each cell is assigned to the vector to that cell 00:12:43.440 |
centroid. Now, here, you see the magenta dot at the bottom, that's our query vector, and it's 00:12:51.280 |
landed within that cell that you see highlighted. Now, because it's landed in there, we only compare 00:12:58.800 |
it to those vectors within that cell that are also assigned to the same cell. And that's what we're 00:13:05.520 |
doing with nlist. We're saying how many of those cells we would like to have. Now, we're going to 00:13:10.960 |
go with 2,048, and it's worth noting that this must be greater than or equal to kthon, which is 00:13:22.720 |
equal to 2 to the power of n bits. So in this case, we could go with 256 to be fair. Let's 00:13:33.680 |
go with that for now and we'll see how it goes. And then we initialize our index like this. So we 00:13:38.720 |
go index equals FICE index IVFPQ. And in here, we need to pass our vectors. So in this case, 00:13:50.320 |
we will just-- when I say vectors, I mean the index that contains our vectors. So you typically 00:13:58.720 |
see this written as quantizer, but I'm going to write it as vectors because I think it's a little 00:14:02.480 |
bit less confusing. So we'll do FICE index flat L2. So we're just going to use these as our starting 00:14:09.520 |
point. But that doesn't mean we're storing these as they are in our index. They actually get 00:14:15.120 |
processed using the PQ step as we already did before. So we have Vects, D, nlist, M, and nbits. 00:14:30.320 |
OK, so we now have our index and we can see if it's trained like we did before. Obviously, 00:14:36.160 |
it will not be because we are still using PQ again. So we need to train that and we do the 00:14:42.880 |
same as we did before. So train xp. After that, we add our vectors and we can get our results. 00:14:53.680 |
So index.search xqk. OK, now we haven't seen how fast that is, so let's have a look. We'll 00:15:05.440 |
just time it again. OK, and although the actual search took longer, it's because we're looping 00:15:14.000 |
through. So we had 10,000 loops for this test and we got this number, which is insanely small. 00:15:21.600 |
So 0.001. No, that's not right. What are we on? This. 0.1 milliseconds. Sorry. 00:15:35.200 |
So 0.1 milliseconds. What did we get before? It was here. So this is using our flat index, 00:15:45.600 |
20 milliseconds. If we go up a little further, we see 3.3 milliseconds for that search up there. 00:15:51.600 |
So yeah, super fast. And then if we check the recall, so we'll use the same as what we used 00:16:00.400 |
before. So where is our sun here? Like I said, this can go up and down. So at 24 this time, 00:16:08.160 |
it's pretty, pretty bad. But I mean, like I said, PQ isn't the best for recall. But what we can do 00:16:15.920 |
is actually, because we're using IVF, we can increase the number of cells that we search 00:16:22.000 |
through. So you can kind of see that happening here. So using the example from before, we solve 00:16:27.520 |
an n-probe value of 1, which is the default. Increase it to 2, 3, 4, 5, and so on and so on. 00:16:33.600 |
And we can increase that all the way up to search all of our n-probe values. Of course, 00:16:37.920 |
that is kind of a waste of time, because if you're searching through all of your cells, 00:16:42.640 |
you're basically just doing a flat search with the additional overhead of having an IVF index. So 00:16:47.680 |
it's even slower. We can use that information. So we know that if we set our index.n-probe 00:16:56.400 |
equal to 2048, which is the number of cells that we-- the maximum number of cells that we set. 00:17:02.720 |
Or did we use 2, 5, 6? No, we used 2, 5, 6, so the endless value. 00:17:07.360 |
So we can just put endless here. If we do that and we do a search again, 00:17:12.640 |
xq and k, and we go dist i. Do that again. And let's come up here, get our recall. So we see 00:17:26.080 |
that the maximum recall we're going to get with this instance is 32%. Now, this is pretty low, 00:17:32.640 |
so working on it before, I was getting 50%, 52%. It's also a case of there is some randomness in 00:17:42.000 |
what you're going to output from your index. But of course, again, we can also increase n bits 00:17:47.600 |
or do some other things. So we can increase n bits, try decreasing m, and see how things work. 00:17:56.720 |
But it's also worth noting with IVFPQ, the n bits value does have to be 8, at the moment at least. 00:18:03.600 |
It's just that it's something that's hardcoded into FICE. Unless you want to go into the FICE 00:18:09.200 |
code and then change that, you can remove it if you want. Now, what we want to do is obviously 00:18:13.600 |
not search through the entire index. So we're going to change n-probe. Let me also bring this 00:18:20.960 |
down. We're going to change n-probe to something that is not quite as high, so maybe 20. 00:18:30.160 |
We'll get 33, so that's high enough for us to get our maximum performance. But obviously, 00:18:36.880 |
it's going to be a lot quicker because we're searching through 20 cells rather than 256. 00:18:41.120 |
Let's try 10, 32, so increase it a little bit to 12, still 32, 16, 33. So around this area here, 00:18:55.360 |
so 14, 13. So 13 is our optimum n-probe value where we get the max recall out of that, 00:19:05.040 |
the quickest time. But of course, which parameters you use is completely up to you, 00:19:11.120 |
and it depends on your use case as well. So that's it for this video. Thank you very much 00:19:16.080 |
for watching. I hope it's been useful, and I will see you in the next one. Bye!