Faiss - Vector Compression with PQ and IVFPQ (in Python)

00:00:00.000 | Hi, welcome to this video. We're going to have a look at how we can implement a product quantization

00:00:06.800 | index or index PQ in FICE, and we're also going to have a look at how we can implement a composite

00:00:13.920 | index using an inverted file list, IVF, and also PQ together to improve our results or our

00:00:23.200 | performance even further. Now, we'll just have a quick look at this. If you watched the previous

00:00:28.240 | video on the logic behind PQ, you'll recognize this, but this is just demonstrating the speed

00:00:33.440 | increase that we can get and also mainly the memory increase, which is the important factor in

00:00:39.680 | PQ. And obviously, there's a huge increase here. So the speed increases is a five to six times

00:00:46.960 | increase with just PQ, and the memory decrease is huge. It's like a 97% reduction in memory.

00:00:55.040 | Now, I'm not going to go through how all of that works because we already did it in the last video.

00:01:01.200 | So there is a link to that video in the description. So if you'd like to go through that,

00:01:07.200 | go ahead and go through it. But if you just want to see how this works in FICE,

00:01:11.200 | then continue and we'll jump straight into it. Okay, so the first thing that we need to do is

00:01:19.520 | actually get our data. So we're going to be using the SIFT1M dataset. Again, in the description,

00:01:25.840 | there's a link to a notebook where you can download and also load that into your own notebook

00:01:32.080 | if you'd like to do that. But I should say it's a pretty straightforward, very popular

00:01:38.320 | dataset for these sort of examples I use all the time. And I'll just show you the sort of shape.

00:01:45.040 | So XB, it is the vectors that we'll be indexing. So we're going to index these vectors in our index

00:01:53.760 | in FICE and search through them. Typically, it has 1 million vectors. I'm going to stick with

00:02:00.240 | 500K because it can just take a little bit of time for everything to load.

00:02:03.680 | Although that being said, I don't think it's that bad. So let me, yeah, you know what? Let's just

00:02:11.200 | go with 1 million. It's more interesting. And then XQ, I'm just using a single vector here.

00:02:19.760 | So we just have one vector in there. As you see, dimensionality for all of our vectors here is 128,

00:02:25.280 | which is pretty typical, but it's definitely on the lower end for dense vectors. So let's first

00:02:32.320 | initialize our PQ index. So we need to import FICE and we'll get our dimensionality, which is

00:02:39.760 | XB.shape1. Now I'm using this slightly different syntax to usual to align better with the PQ

00:02:50.320 | notation. Usually, you can go with lowcase D and you can go with that. It's fine. It's not really

00:02:55.200 | an issue. I'm just going to go with uppercase for now to align better to that notation if you are

00:03:01.520 | following on from the previous video. And we're going to set M equals to 8, so that's how many

00:03:06.320 | subvectors we have. Now, typically, we assert that D is divisible by M, but we already know it is,

00:03:13.200 | so I'm going to skip that. But that's something that you do need to make sure that D is divisible.

00:03:20.240 | In fact, you know what? I'm going to put it in anyway. So let's assert that D is divisible by M.

00:03:27.040 | Okay, just put this in because otherwise we are not splitting our vectors or subvectors equally,

00:03:33.520 | and we will not be allowed to do that. So we do need to make sure that's always the case.

00:03:37.840 | And then we're going to set the nBits value. So this is the number of bits per subquantizer. So

00:03:45.840 | if you were watching previous videos, we would see this as, so the K star value is equal to 2

00:03:56.240 | to the power of nBits. Okay, so the number of centroids that we have within each subquantizer.

00:04:03.920 | And then we initialize our index. So index equals FICE, index PQ, D, M, and nBits.

00:04:13.040 | Okay, that's our index ready to go. And what we need to do now is actually add our vectors. Now,

00:04:23.520 | you have to think with product quantization, we are clustering. Our subquantizers are using

00:04:30.480 | clustering. So we do actually need to train it. So we can see if an index needs to be trained or not

00:04:37.840 | by using this. So we write index is trained. In this case, is trained is false. So that means we

00:04:42.880 | need to train it before we add our vectors. So to do that, we need to write index train,

00:04:51.120 | and then we pass it xB. We run that. And this can take a little bit of time. It shouldn't be too

00:04:56.080 | long. But if you start increasing the nBits value, which is probably a good idea. I mean, nBits of

00:05:06.160 | maybe 11 is actually what is recommended in the PQ paper. But I'm not using it here because it

00:05:14.480 | takes a long time to train and build your vectors. So I'm avoiding it here. But if you're using this,

00:05:24.400 | I definitely recommend trying those larger nBits values. It's also worth noting that

00:05:32.640 | when we're using composite index of IVF and PQ, we can't use an nBits value of greater than

00:05:40.080 | 8. But we'll move on to that anyway. That has to be something that they've hard-coded into FICE.

00:05:47.280 | For now, I think they should be removing that limitation at some point. So we have now trained,

00:05:54.320 | and now we can add our vectors. So now what we want to do is we'll return, say, the top

00:06:03.280 | k most similar vectors. We'll go with k of 100. And then we want to do our search.

00:06:10.320 | So we're going to go distance. So typically, we'd write d here. But obviously, we're already using

00:06:17.040 | d. So I'm going to write distance equals index.search. And then in here, we get past our

00:06:24.400 | query. And we also pass k. And that will return the k nearest neighbors in that search. So in here,

00:06:35.120 | if we write shape, we see that we have 100 of those. And then distances here are the actual

00:06:41.120 | distances. So i are the indices of the nearest neighbors. Distance is the actual distances that

00:06:47.840 | they have returned. Now let's go ahead. And what I'm going to do is I'm going to time it. So we

00:06:54.240 | can use this time function to essentially run this cell multiple times and get the average time

00:07:02.240 | taken to run it, which is quite useful when we're comparing different indexes.

00:07:07.440 | And we'll do index.search. And we want x, q, and k. Now we see that that took 3.34 milliseconds.

00:07:21.440 | So reasonably fast. Let's compare that to a flat index. So we'll do L2 index equals FICE

00:07:29.760 | index flat L2, d with a capital, L2 index dot add. And we want to add all of our data. So it's b.

00:07:44.880 | And let's time that again. So I'm just going to copy this.

00:07:50.560 | Here. And we'll just replace that with L2 index. And let's see what we get. OK. So an average of

00:07:58.320 | 19.5 milliseconds. So already, this is pretty quick, right? Which is cool. Now we can check

00:08:06.720 | the performance of our index as well. Now this varies from all the times I've tested it. But we

00:08:13.840 | should get something around maybe 50. So 50% recall isn't exactly cutting edge. But that's PQ.

00:08:22.480 | That is how it is. If we want high recall, we can increase n bits to get that. But it's definitely

00:08:29.920 | a drawback of using PQ. OK. So we want to sum one value for every value that we find in our index.

00:08:41.920 | So in the return indexes for our PQ results, if they are in the L2 results. So we're just saying,

00:08:51.840 | you know, how many of the results returned by our PQ index were also returned by our L2 index,

00:08:57.680 | which we are seeing as the 100% recall. So L2 index, 100% recall. And here we're seeing, OK,

00:09:06.080 | how many of those do we match to using our PQ index. If i in L2 i. So I actually need to just

00:09:16.240 | copy this, pull it down here. So when we use a time it, we can't get any variables out from it,

00:09:24.160 | which is, I don't know why, but it's a little bit annoying. So we will get L2 this and L2 i.

00:09:34.560 | OK. So this time we get 38% recall. OK. So let's now have a look at the memory usage

00:09:47.520 | of each of these. So I'm just going to define a function, get to memory,

00:09:51.120 | so we can more easily compare those. And what we'll do is we'll just write our index to file,

00:10:00.080 | like so, index. And I'll just call it temp index now, because we're going to delete it straight

00:10:06.640 | away afterwards. We're just writing it to file so we can read, see how large the index is.

00:10:12.480 | And then we delete it again. The file size equals OS path dot get size to get size of that index in

00:10:23.120 | bytes, I think. So we'll do temp dot index. And then we don't really-- we're just using that to

00:10:31.680 | check the size, so we just want to delete that afterwards. So like so. And then we just return

00:10:40.720 | the file size. OK. So let's do that for both of our indexes. So we get the memory for the L2 index

00:10:52.880 | first. Let me-- so that's remove, not remote. OK. So we get pretty big. That's half a gigabyte,

00:11:08.800 | which is massive. And then we want to get memory again for our index. Let's see what we have.

00:11:18.960 | So now we have eight megabytes. So that's a massive draw. That's eight over 512.

00:11:32.800 | So it's a 98.4% reduction in size, which is huge, which is really cool.

00:11:47.680 | Now, we-- like I said before, we have this composite index where we have both the IVF step

00:11:52.480 | and a PQ step. So we're going to have a look at how we can implement that, and we'll see

00:11:56.880 | that memory decrease will be slightly less significant because it's a more complex index,

00:12:03.200 | although very small still. And then the speed decrease will be absolutely insane. So we'll go

00:12:10.720 | into that. So now we have nlist, which is the number of cells within our IVF index.

00:12:22.320 | So if we just have a quick look at what that might look like. So imagine these are our PQ vectors

00:12:28.320 | on a 2D space. Adding IVF essentially does this. So we add all these separating cells called

00:12:36.640 | Voronoi cells, and each of the vectors within each cell is assigned to the vector to that cell

00:12:43.440 | centroid. Now, here, you see the magenta dot at the bottom, that's our query vector, and it's

00:12:51.280 | landed within that cell that you see highlighted. Now, because it's landed in there, we only compare

00:12:58.800 | it to those vectors within that cell that are also assigned to the same cell. And that's what we're

00:13:05.520 | doing with nlist. We're saying how many of those cells we would like to have. Now, we're going to

00:13:10.960 | go with 2,048, and it's worth noting that this must be greater than or equal to kthon, which is

00:13:22.720 | equal to 2 to the power of n bits. So in this case, we could go with 256 to be fair. Let's

00:13:33.680 | go with that for now and we'll see how it goes. And then we initialize our index like this. So we

00:13:38.720 | go index equals FICE index IVFPQ. And in here, we need to pass our vectors. So in this case,

00:13:50.320 | we will just-- when I say vectors, I mean the index that contains our vectors. So you typically

00:13:58.720 | see this written as quantizer, but I'm going to write it as vectors because I think it's a little

00:14:02.480 | bit less confusing. So we'll do FICE index flat L2. So we're just going to use these as our starting

00:14:09.520 | point. But that doesn't mean we're storing these as they are in our index. They actually get

00:14:15.120 | processed using the PQ step as we already did before. So we have Vects, D, nlist, M, and nbits.

00:14:30.320 | OK, so we now have our index and we can see if it's trained like we did before. Obviously,

00:14:36.160 | it will not be because we are still using PQ again. So we need to train that and we do the

00:14:42.880 | same as we did before. So train xp. After that, we add our vectors and we can get our results.

00:14:53.680 | So index.search xqk. OK, now we haven't seen how fast that is, so let's have a look. We'll

00:15:05.440 | just time it again. OK, and although the actual search took longer, it's because we're looping

00:15:14.000 | through. So we had 10,000 loops for this test and we got this number, which is insanely small.

00:15:21.600 | So 0.001. No, that's not right. What are we on? This. 0.1 milliseconds. Sorry.

00:15:35.200 | So 0.1 milliseconds. What did we get before? It was here. So this is using our flat index,

00:15:45.600 | 20 milliseconds. If we go up a little further, we see 3.3 milliseconds for that search up there.

00:15:51.600 | So yeah, super fast. And then if we check the recall, so we'll use the same as what we used

00:16:00.400 | before. So where is our sun here? Like I said, this can go up and down. So at 24 this time,

00:16:08.160 | it's pretty, pretty bad. But I mean, like I said, PQ isn't the best for recall. But what we can do

00:16:15.920 | is actually, because we're using IVF, we can increase the number of cells that we search

00:16:22.000 | through. So you can kind of see that happening here. So using the example from before, we solve

00:16:27.520 | an n-probe value of 1, which is the default. Increase it to 2, 3, 4, 5, and so on and so on.

00:16:33.600 | And we can increase that all the way up to search all of our n-probe values. Of course,

00:16:37.920 | that is kind of a waste of time, because if you're searching through all of your cells,

00:16:42.640 | you're basically just doing a flat search with the additional overhead of having an IVF index. So

00:16:47.680 | it's even slower. We can use that information. So we know that if we set our index.n-probe

00:16:56.400 | equal to 2048, which is the number of cells that we-- the maximum number of cells that we set.

00:17:02.720 | Or did we use 2, 5, 6? No, we used 2, 5, 6, so the endless value.

00:17:07.360 | So we can just put endless here. If we do that and we do a search again,

00:17:12.640 | xq and k, and we go dist i. Do that again. And let's come up here, get our recall. So we see

00:17:26.080 | that the maximum recall we're going to get with this instance is 32%. Now, this is pretty low,

00:17:32.640 | so working on it before, I was getting 50%, 52%. It's also a case of there is some randomness in

00:17:42.000 | what you're going to output from your index. But of course, again, we can also increase n bits

00:17:47.600 | or do some other things. So we can increase n bits, try decreasing m, and see how things work.

00:17:56.720 | But it's also worth noting with IVFPQ, the n bits value does have to be 8, at the moment at least.

00:18:03.600 | It's just that it's something that's hardcoded into FICE. Unless you want to go into the FICE

00:18:09.200 | code and then change that, you can remove it if you want. Now, what we want to do is obviously

00:18:13.600 | not search through the entire index. So we're going to change n-probe. Let me also bring this

00:18:20.960 | down. We're going to change n-probe to something that is not quite as high, so maybe 20.

00:18:30.160 | We'll get 33, so that's high enough for us to get our maximum performance. But obviously,

00:18:36.880 | it's going to be a lot quicker because we're searching through 20 cells rather than 256.

00:18:41.120 | Let's try 10, 32, so increase it a little bit to 12, still 32, 16, 33. So around this area here,

00:18:55.360 | so 14, 13. So 13 is our optimum n-probe value where we get the max recall out of that,

00:19:05.040 | the quickest time. But of course, which parameters you use is completely up to you,

00:19:11.120 | and it depends on your use case as well. So that's it for this video. Thank you very much

00:19:16.080 | for watching. I hope it's been useful, and I will see you in the next one. Bye!

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Chapters