back to index

Faiss - Introduction to Similarity Search


Chapters

0:0 Introduction
3:21 Code Overview
6:15 Index Flat L2
8:10 Index Training
9:2 Adding Vectors
12:10 Query Time
13:0 Voronoi Cells
16:15 Coding
20:16 Index IVF
22:43 Product Quantization Index
25:54 Implementing Product Quantization Index
30:38 Comparison

Whisper Transcript | Transcript Only Page

00:00:00.920 | Hi, welcome to this video.
00:00:03.120 | We're going to be covering Facebook AI Similarity Search, or FICE.
00:00:06.760 | And we're going to be covering what FICE is
00:00:10.880 | and how we can actually begin using it.
00:00:14.480 | And we'll introduce a few of the key indexes that we can use.
00:00:18.920 | So just as a quick introduction to FICE,
00:00:24.760 | as you can probably tell from the name, it's a similarity search
00:00:28.280 | and it's a library that we can use from Facebook AI
00:00:32.880 | that allows us to compare vectors
00:00:37.240 | with a very high efficiency.
00:00:42.120 | So, if you've seen any of my videos before
00:00:46.000 | on building sentence embeddings and comparing sentence embeddings,
00:00:50.520 | in those videos, I just did a generic Python loop
00:00:54.480 | to go through and compare each embedding.
00:00:56.800 | And that's very slow.
00:00:58.760 | Now, if you're only working with maybe 100 vectors,
00:01:02.360 | it's probably OK, you can deal with that.
00:01:04.080 | But in reality, we're probably never going to be working
00:01:07.080 | with that smaller data set.
00:01:08.880 | Facebook AI Similarity Search can scale to tens,
00:01:13.040 | hundreds of thousands or up to millions and even billions.
00:01:16.880 | So this is incredibly good
00:01:22.160 | for efficient similarity search.
00:01:26.400 | But before we get into it, I'll just sort of visualize
00:01:29.560 | what this index looks like.
00:01:32.880 | So if we imagine that we have
00:01:36.960 | all of the vectors that we have created and we put it into our
00:01:41.960 | similar search index, now they could look like this.
00:01:47.000 | So this is only a three dimensional space.
00:01:50.000 | But in reality, there would be hundreds of dimensions here.
00:01:55.840 | In our use case, we're going to be using dimensions of 768.
00:02:02.160 | So, you know, there's a fair bit in there.
00:02:04.760 | Now, when we search,
00:02:11.240 | we would introduce a new vector into here.
00:02:15.200 | So let's say here, this is our query vector, so X, Q.
00:02:19.520 | Now, if we were comparing every item here,
00:02:24.520 | we would have to calculate the distance
00:02:27.120 | between every single item.
00:02:29.400 | So we would calculate for between our query vector
00:02:33.920 | and every other vector that is already in there
00:02:36.400 | in order to find the vectors which are closest to it.
00:02:38.960 | Now, we can optimize this.
00:02:42.720 | We can improve, we can decrease the number of dimensions
00:02:49.040 | in each of our vectors and do it in a intelligent way
00:02:51.800 | so they take up less space and the calculations are faster.
00:02:55.960 | And we can also restrict our search.
00:02:58.320 | So in this case, rather than comparing every single item,
00:03:01.880 | we might restrict our search to just this area here.
00:03:06.400 | And these are a few of the optimizations
00:03:10.320 | at a very high level that we can do with FICE.
00:03:13.240 | So that's enough for the introduction to FICE.
00:03:18.280 | Let's actually jump straight into the code.
00:03:21.640 | Okay, so this is our code.
00:03:23.800 | In here, this is how we are loading in
00:03:27.200 | all of our sentence embedding.
00:03:29.440 | So I've gone ahead and processed them already
00:03:31.320 | 'cause they do take a little bit of time to actually build,
00:03:35.000 | but we're building them from this file here.
00:03:38.040 | We'll load this into Python as well,
00:03:39.800 | but I mean, it's pretty straightforward.
00:03:42.960 | It's just a load of sentences
00:03:45.880 | that have been separated by a newline character.
00:03:48.400 | And then in here, we have all of those NumPy binary files.
00:03:52.560 | Now, there's NumPy binary files.
00:03:55.400 | Like I said, we're getting them from GitHub,
00:03:57.040 | which are over here.
00:03:58.480 | That's where we're pulling them all in using this cell here.
00:04:02.960 | Now, that saves everything to file.
00:04:06.720 | And then we just read in each of those files
00:04:08.920 | and we append them all into a single NumPy array here.
00:04:14.160 | And that gives us these 14.5 thousand samples.
00:04:18.520 | Each embedding is a vector with 768 values inside.
00:04:23.520 | So that's how we're loading in our data.
00:04:27.560 | I'll also load in that text file as well.
00:04:29.840 | So we just want to do with open
00:04:32.000 | sentences.text.
00:04:39.760 | And then we're just reading that in as a normal file.
00:04:43.680 | And we just write, I'm gonna put lines equals fp.read.
00:04:48.680 | And like I said, we're splitting that by newline characters.
00:04:52.400 | So we just write that.
00:04:55.120 | Sorry, it's sentences.
00:05:00.880 | And we see a few of those as well.
00:05:06.640 | Okay.
00:05:09.880 | Now, to convert from those sentences
00:05:13.200 | into those sentence embeddings,
00:05:15.440 | I need to import this anyway for later on
00:05:17.600 | when we're building our query vectors.
00:05:18.800 | I'll just show you how I do that now.
00:05:21.240 | All we do is from sentence transformers,
00:05:23.640 | which is the library we're using to create those embeddings,
00:05:27.480 | import sentence transformer.
00:05:36.720 | And then our model, we're using sentence transformer again.
00:05:40.040 | And we're using the BERT and BASE NLI mean tokens model.
00:05:45.040 | Okay.
00:05:47.920 | So that's how we initialize our model.
00:05:51.160 | And then when we're encoding our text,
00:05:52.720 | we'll see in a moment, we just write model encode,
00:05:56.680 | and then we write something in here, hello world.
00:05:59.800 | Okay, and that will encode,
00:06:01.680 | that will give us a sentence embedding.
00:06:04.400 | Okay.
00:06:05.240 | So that is what we have inside here.
00:06:09.000 | We just have the sentence embeddings
00:06:10.720 | of all of our lines here.
00:06:12.440 | Now, I think we have everything we need to get started.
00:06:19.400 | So let's build our first FICE index.
00:06:22.880 | So the first one we're gonna build
00:06:26.760 | is called the index flat L2.
00:06:31.760 | And this is a flat index,
00:06:34.120 | which means that all the vectors are just flat vectors.
00:06:38.120 | We're not modifying them in any way.
00:06:40.320 | And the L2 stands for the distance metric
00:06:44.200 | that we're using to measure the similarity of each vector
00:06:49.200 | or the proximity of each vector.
00:06:52.600 | And L2 is just Euclidean distance.
00:06:54.720 | So it's a pretty straightforward function.
00:06:59.720 | Now, to initialize that, we just write FICE.
00:07:03.520 | So we imported, no, so we need to import FICE.
00:07:07.120 | And then we write index equals FICE.index flat L2.
00:07:14.280 | And then in here, we need to pass the dimensionality
00:07:17.560 | of our vectors or our sentence embeddings.
00:07:21.240 | Now, what is our dimensionality?
00:07:23.400 | So each one is 768 values long.
00:07:28.920 | So if we'd like a nicer way of writing out,
00:07:33.920 | we put sentence embeddings and we write shape one.
00:07:39.720 | And our index requires that
00:07:46.560 | in order to be properly initialized.
00:07:49.080 | So do that.
00:07:50.280 | That will be initialized.
00:07:54.240 | Let me run it again.
00:07:55.640 | The, I think my notebook just restarted.
00:08:00.640 | It did restart, it's weird.
00:08:04.920 | Okay, one minute.
00:08:05.800 | So that's going to initialize the index.
00:08:14.360 | And there is one thing that we need to be aware of.
00:08:16.800 | So sometimes with these indexes,
00:08:19.840 | we will need to train them.
00:08:22.920 | So if the index is going to do any clustering,
00:08:26.040 | we will need to train that clustering algorithm on our data.
00:08:29.520 | And now in this case, we can check
00:08:32.520 | if an index needs training or is trained already
00:08:36.000 | using the is trained attribute.
00:08:37.840 | And we'll see with this index,
00:08:41.720 | because it's just a flat L2 index,
00:08:45.600 | it's not doing anything special.
00:08:47.960 | We'll see, because it's not doing anything special,
00:08:52.040 | we don't need to train it.
00:08:52.920 | And we can see that when we write is trained,
00:08:56.160 | it says it's already trained,
00:08:57.440 | just means that we don't actually need to train it.
00:08:59.560 | So that's good.
00:09:01.320 | Now, how do we add our vectors, our sentence embeddings?
00:09:07.640 | All we need to do is write index, add,
00:09:09.960 | and then we just add embeddings like so.
00:09:12.520 | So pretty straightforward.
00:09:14.760 | So add sentence embeddings.
00:09:18.840 | And then from there,
00:09:20.840 | we can check that they've been added properly
00:09:24.280 | by looking at the end total value.
00:09:26.440 | So this is number of embeddings or vectors
00:09:29.240 | that we have in our index.
00:09:31.080 | And with that, we can go ahead and start querying.
00:09:34.560 | So let's first create a query.
00:09:37.480 | So we'll do XQ, which is our query vector.
00:09:42.000 | And we want to do the model and code that we did before.
00:09:45.320 | Now, I'm going to write someone sprints with a football.
00:09:50.320 | Okay.
00:09:53.760 | That's going to be our query vector.
00:09:57.600 | And to search, we do this.
00:10:00.280 | So we write DI equals index, search, XQ.
00:10:05.280 | And then in here, we need to add K as well.
00:10:09.720 | So K, let me define it above here.
00:10:14.240 | So K is the number of items or vectors,
00:10:19.080 | similar vectors that we'd like to return.
00:10:20.800 | So I'm going to want to return four.
00:10:23.120 | So with here, with this,
00:10:25.720 | we will return four index IDs into this I variable here.
00:10:30.720 | I'm going to time it as well,
00:10:33.600 | just so you see how long it takes.
00:10:35.320 | And let's print I.
00:10:42.200 | You can see that we get these four items.
00:10:44.960 | Now, these align to our lines.
00:10:48.520 | So the text that we have up here, that will align.
00:10:52.080 | So what we can do is we can print all of those out.
00:10:56.480 | So let's do I.
00:10:58.360 | And then in here, we want to write lines I for I.
00:11:05.120 | Sorry, let me end that.
00:11:11.440 | I for I in I.
00:11:15.840 | Okay.
00:11:17.560 | Ah, sorry.
00:11:22.560 | So this is zero here.
00:11:24.880 | Okay, so these are the sentences
00:11:27.760 | or the similar sentences that we got back.
00:11:29.280 | And we see, obviously, it seems to be working pretty well.
00:11:31.520 | All of them talking about football
00:11:33.320 | or being on a football field.
00:11:36.160 | So that looks pretty good, right?
00:11:40.240 | The only problem is that this takes a long time.
00:11:42.720 | We don't have that many vectors in there.
00:11:45.080 | And it took 57.4 milliseconds.
00:11:48.400 | So it's a little bit long
00:11:51.520 | and something that we can actually improve.
00:11:55.320 | Okay, so before we move on to the next index,
00:12:00.560 | I just want to have a look at the sort of speed
00:12:02.960 | that we would expect from this when we are,
00:12:05.640 | this is a very small data set.
00:12:06.880 | So what else could we expect?
00:12:10.760 | So if we go over here, I've already written all this code.
00:12:14.640 | If you'd like to go through this notebook,
00:12:16.560 | I'll leave a link in the description.
00:12:18.480 | So come down here, we have this flat L2 index,
00:12:23.560 | and this is the query time.
00:12:25.120 | So this is for a randomly generated vector
00:12:28.400 | with a dimension size of 100.
00:12:30.320 | And this is a number of vectors within that index.
00:12:35.320 | So we go up to 1 million here.
00:12:38.320 | And this is a query time in milliseconds.
00:12:40.880 | You can see, you know, it increases quite quickly.
00:12:44.560 | Now this is in FICE, but it's still an exhaustive search.
00:12:47.960 | We're not really optimizing how we could do.
00:12:52.040 | We're not using that approximate search capabilities
00:12:55.240 | of FICE.
00:12:56.880 | So if we switch back over to FICE,
00:13:00.280 | we can begin using that approximate search
00:13:05.160 | by adding partitioning into our index.
00:13:09.440 | Now, the most popular of these uses a technique
00:13:13.280 | very similar to something called Voronoi cells.
00:13:17.520 | I'm not sure how you pronounce it.
00:13:18.720 | I think that's about right.
00:13:20.960 | And I can show you what that looks like.
00:13:25.600 | So over here,
00:13:27.160 | if we go here, we have all of these.
00:13:32.680 | So this is called a Voronoi diagram.
00:13:37.680 | And each of the sort of squares
00:13:41.800 | or the cells that you see are called Voronoi cells.
00:13:44.640 | So here we have Voronoi cells.
00:13:49.640 | And that is just what you see here.
00:13:54.960 | So this, this, all of these kind of squares
00:13:59.680 | are each a cell.
00:14:01.800 | Now, as well as those, we also have our centroids.
00:14:05.200 | So I'm just gonna write this out instead.
00:14:07.680 | So centroids.
00:14:08.960 | And these are simply the centers of those cells.
00:14:13.320 | Now, when we introduce a new vector
00:14:17.880 | or our query vector into this,
00:14:21.360 | what we're doing is essentially,
00:14:23.200 | so we have our query vector,
00:14:25.040 | and let's say it appears here.
00:14:29.840 | Now, within each one of these cells,
00:14:32.000 | we actually have a lot of other vectors.
00:14:35.160 | So we could have, you know,
00:14:36.880 | we could have millions in each cell.
00:14:38.720 | So there's a lot in there.
00:14:42.840 | And if we were to compare that query vector,
00:14:45.520 | this thing here, to every single one of those vectors,
00:14:50.360 | it would obviously take a long time.
00:14:52.160 | We're going through every single one.
00:14:54.160 | We don't want to do that.
00:14:55.320 | So what this approach allows us to do
00:14:58.440 | is instead of checking against every one of those vectors,
00:15:01.360 | we just check it against every centroid.
00:15:03.800 | And once we figure out which centroid is the closest,
00:15:08.360 | we limit our search scope to only vectors
00:15:14.000 | that are within that centroid Voronoi cell.
00:15:18.560 | So in this case, it would probably be this centroid here,
00:15:23.000 | which is the closest.
00:15:24.480 | And then we would just limit our search
00:15:27.280 | to only be within these boundaries.
00:15:31.400 | Now, what we might find is maybe there's,
00:15:34.440 | the closest vector here is actually here,
00:15:37.360 | whereas the closest vector here is right there.
00:15:40.040 | So in reality, this vector here, this one,
00:15:43.640 | might actually be a better approximation or a better,
00:15:49.680 | it might be more similar to our query.
00:15:53.480 | And that's why this is approximate search,
00:15:56.400 | not exhaustive search,
00:15:58.000 | because we might miss out on something,
00:16:02.040 | but that is kind of outweighed
00:16:04.560 | by the fact that this is just a lot, a lot faster.
00:16:08.800 | So it's sort of pros and cons.
00:16:11.040 | It's whatever is going to work best for your use case.
00:16:15.560 | Now, if we want to implement that in code,
00:16:19.600 | first thing that we want to do
00:16:20.760 | is define how many of those cells we would like.
00:16:25.400 | So I'm going to go 50.
00:16:27.440 | So use this endless parameter.
00:16:29.680 | And then from there, we can set up our quantizer,
00:16:33.000 | which is almost, it's like another step in the process.
00:16:37.400 | So with our index,
00:16:41.840 | we are still going to be measuring the L2 distance.
00:16:44.560 | So we still actually need that index in there.
00:16:48.200 | So to do that, we need to write FICE index flat L2,
00:16:53.200 | and we pass out dimensions again,
00:16:58.560 | just like we did before.
00:16:59.760 | And like I said, that's just a step in the process.
00:17:04.120 | That's not our full index.
00:17:06.080 | Our full index is going to look like this.
00:17:09.080 | So we write index.
00:17:11.000 | And in here, we're going to have our FICE,
00:17:13.120 | and this is a new index.
00:17:14.640 | So this is the one that is creating those partitions.
00:17:16.920 | So we write index IVF flat.
00:17:21.920 | And in there, we need to pass our quantizer,
00:17:28.080 | the dimensions, and also the end list.
00:17:35.440 | Okay, now, if you remember what I said before,
00:17:41.520 | we, in some cases, we'll need to train our index.
00:17:46.120 | Now, this is an example of one of those times.
00:17:49.520 | Because we're doing the clustering
00:17:51.800 | and creating those foreign noise cells,
00:17:54.640 | we do need to train it.
00:17:56.040 | And we can see that because this is false.
00:17:58.320 | Now, to train it, we need to just write index train,
00:18:03.320 | and then in here,
00:18:09.360 | we want to pass all of our sentence embeddings.
00:18:12.960 | So sentence embeddings, like so.
00:18:17.960 | Let's run that.
00:18:18.920 | It's very quick.
00:18:19.920 | And then we can write, it's trained.
00:18:22.960 | And we see that's true.
00:18:23.800 | So now our index is essentially ready to receive our data.
00:18:28.800 | So we do this exactly the same way as we did before.
00:18:35.320 | We write index add,
00:18:37.280 | and we pass our sentence embeddings again.
00:18:41.560 | And we can check that everything is in there
00:18:43.320 | with index and total.
00:18:46.000 | Okay, so now we see that we have our index, it's ready,
00:18:49.600 | and we can begin querying it.
00:18:51.800 | So what I'm going to do is use the exact same query vector
00:18:56.800 | that we used before.
00:18:59.080 | Going to time it so that we can see how quick this is
00:19:02.600 | compared to our previous query.
00:19:04.680 | And we're actually going to write
00:19:08.720 | the exact same thing we wrote before.
00:19:10.640 | So can I actually just copy it?
00:19:14.480 | So I'll take that, bring it here.
00:19:24.080 | There we go.
00:19:26.120 | So now let's have a look.
00:19:27.920 | So total 7.22.
00:19:30.600 | So bring it up here, and we have 57.4.
00:19:33.760 | Now this is maybe a little bit slow.
00:19:36.960 | So we'll see that the times do vary a little bit
00:19:41.920 | quite randomly, but maybe that's a little bit slow,
00:19:46.920 | but it's probably pretty realistic.
00:19:49.640 | So that took 57 milliseconds.
00:19:54.320 | This one, seven.
00:19:56.640 | Now let's have a look.
00:19:57.520 | So these are the indexes we've got.
00:19:59.320 | Let's compare them to what we had before.
00:20:02.720 | And I believe they're all the same.
00:20:06.080 | So we've just shortened the time by a lot
00:20:10.360 | and we're getting the exact same results.
00:20:13.240 | So that's pretty good.
00:20:14.200 | Now, sometimes we will find
00:20:17.240 | that we do get different results.
00:20:18.320 | And a lot of the time that's fine,
00:20:20.520 | but maybe if you find the results are not that great
00:20:25.520 | when you add this sort of index,
00:20:27.760 | then that just means that this search
00:20:30.240 | is not exhaustive enough.
00:20:32.680 | Like we are using approximate search,
00:20:34.200 | but maybe we should approximate a little bit less
00:20:36.760 | and be slightly more exhaustive.
00:20:39.120 | And we can do that by setting the nProbe value.
00:20:43.400 | So nProbe, let's explain in a minute.
00:20:48.400 | So let me actually first just run this
00:20:52.400 | and we can see it will probably take slightly longer.
00:20:56.840 | So yeah, we get 15 milliseconds here.
00:20:59.840 | Of course, we get the same results again
00:21:02.680 | 'cause there were no accuracy issues here anyway.
00:21:06.320 | But let me just explain what that is actually doing.
00:21:10.680 | So in this case here,
00:21:13.200 | what you can see is a IVF search
00:21:18.160 | where we are using an nProbe value of one.
00:21:21.720 | So we're just searching one cell
00:21:26.320 | based on the first nearest centroid to our query vector.
00:21:30.560 | Now, if we increase this up to eight,
00:21:33.920 | or let's use a smaller number in this example.
00:21:37.720 | So maybe we increase it to four,
00:21:41.200 | our four nearest centroids.
00:21:44.400 | So I would say probably these,
00:21:48.280 | this one, this one, this one,
00:21:52.680 | and the one we've already highlighted.
00:21:54.560 | All of those would now be in scope
00:21:58.840 | because our nProbe value,
00:22:00.200 | so the number of cells that we are going to search is four.
00:22:04.000 | Now, if we increase it again to say six,
00:22:07.960 | these two cells might also be included.
00:22:10.440 | Now, of course, when we do that,
00:22:12.960 | we are searching more,
00:22:14.800 | so we might get a better performance, better accuracy.
00:22:18.840 | But in terms of performance in time,
00:22:23.320 | it's also not, it's also going to increase
00:22:27.200 | and we don't want time to increase.
00:22:30.920 | So there's a trade-off between those two.
00:22:34.080 | In our case, we don't really need to increase this,
00:22:36.520 | so don't really need to worry about it.
00:22:41.280 | So that is the index IVF.
00:22:47.680 | And we have one more that I want to look at,
00:22:53.360 | and that is the product quantization index.
00:22:57.640 | So this is actually, so we use IVF,
00:23:02.640 | and then we also use product quantization.
00:23:05.640 | So it's probably better if I try and draw this out.
00:23:10.640 | So when we use product quantization,
00:23:15.880 | imagine we have one vector here.
00:23:19.080 | So this is our vector.
00:23:21.960 | Now, the first step in product quantization
00:23:25.360 | is to split this into sub-vectors.
00:23:28.440 | So we split this into several and then we take them out.
00:23:32.120 | We pull these out and they are now
00:23:36.160 | their own sort of mini-vectors.
00:23:38.640 | And this is just one vector that I'm visualizing here,
00:23:43.640 | but we would obviously do this with many, many vectors.
00:23:48.000 | So there would be many, many more.
00:23:51.360 | So in our case, that's just under 15,000.
00:23:56.360 | Now, that means that we have a lot of these sub-vectors.
00:24:01.400 | And what we do with these is we run them
00:24:10.520 | through their own clustering algorithm.
00:24:12.880 | So what we do is we end up getting clusters
00:24:16.400 | and each of those clusters is going to have a centroid.
00:24:21.400 | So this one would also be run through one.
00:24:23.600 | So each subset of vector slices
00:24:27.840 | is going to be run through its own clustering algorithm
00:24:32.840 | creating these centroids.
00:24:35.080 | And these centroids are smaller in size
00:24:38.920 | than the original sub-vectors here.
00:24:42.200 | And what we do is for each of these sub-vectors,
00:24:47.200 | so each of these sub-vectors, they get pulled into here.
00:24:52.160 | So maybe this one is here
00:24:55.200 | and it gets assigned to its nearest centroid.
00:24:58.560 | And then we take that assignment
00:25:02.660 | all the way back over here and add it into our vector.
00:25:07.660 | So this is centroid three.
00:25:12.480 | For example, and when I say assign it back,
00:25:17.160 | it's probably the wrong way to think about it.
00:25:19.920 | Maybe it's more like this.
00:25:22.200 | So it becomes a new vector
00:25:25.400 | built from those centroid IDs.
00:25:31.800 | Okay, so this would be three.
00:25:34.120 | Now, what that does is essentially reduces
00:25:38.320 | the size of our vectors, but pretty significantly,
00:25:43.260 | depending on what dimensions we use there.
00:25:47.220 | So, so you're back to the code.
00:25:52.360 | Let's implement that.
00:25:53.460 | Now we need to define two new variables here.
00:25:58.440 | So M, which is going to be the number of centroids
00:26:02.000 | in the final vector.
00:26:04.000 | So now one thing that we do need to know with M
00:26:08.640 | is that M must be,
00:26:12.240 | we must be able to multiply M into D.
00:26:15.320 | So what is our D value?
00:26:17.880 | It's 100, I can't remember.
00:26:20.580 | Where are we?
00:26:23.420 | Let me check.
00:26:25.440 | So 768.
00:26:30.040 | Now we should be able to divide that into eight, I think.
00:26:33.920 | Yeah, so this is good.
00:26:36.800 | We can use eight for M,
00:26:38.600 | but we couldn't use something like five
00:26:40.720 | because if we do five, we see that D doesn't fit.
00:26:45.480 | So five doesn't fit nicely into D, whereas eight does.
00:26:50.180 | So M or D must be a multiple of M.
00:26:55.180 | Otherwise we're going to get an error.
00:26:59.640 | And that's because of the way that those vectors
00:27:04.440 | are broken down into the final centroid ID vectors.
00:27:09.440 | And we also need to specify the number of bits
00:27:13.220 | within each of those centroids.
00:27:15.320 | So this value, we can use what we want.
00:27:17.800 | I'm going to use eight.
00:27:19.680 | And then we can set up our index and also the quantizer.
00:27:24.160 | So we use a quantizer as we did before.
00:27:26.640 | So the quantizer is going to be FICE.index Flat L2 D.
00:27:31.640 | And also our index here is going to be,
00:27:40.640 | so this is a new one.
00:27:42.600 | This is index IVFPQ.
00:27:45.160 | So it's not a flat vector anymore, which is the full vector.
00:27:49.240 | It's a quantized vector.
00:27:52.080 | So where we have reduced the size of it through this,
00:27:57.080 | through the method I explained before, where we drew out.
00:28:00.800 | Now we need to pass a few arguments into here.
00:28:07.200 | First one is the quantizer.
00:28:10.960 | So the quantizer D, which is our own dimensionality,
00:28:15.920 | and list M and bits.
00:28:20.840 | So pass all of those to our index.
00:28:23.760 | Sorry, we need to put FICE there as well.
00:28:28.080 | And there we go.
00:28:31.720 | So we now have our index again.
00:28:34.400 | You may have guessed that we might need to train this one.
00:28:37.680 | There we go.
00:28:40.840 | So to train it, we'll just write index.train.
00:28:44.400 | That's our sentence embeddings.
00:28:49.440 | Okay, so it might take a little bit longer this time.
00:28:52.480 | There we go.
00:28:53.360 | And then we can add our vectors.
00:28:55.360 | Now, after adding those,
00:28:57.920 | let's see how quick this is.
00:29:00.520 | Should be a lot quicker, or a fair bit quicker.
00:29:03.040 | It's hard to get much quicker than the last one.
00:29:05.480 | So we're going to use the same code as before.
00:29:08.120 | So I'm going to take this right down here.
00:29:12.880 | See, 2.86.
00:29:15.040 | So we've gotten a lot faster.
00:29:18.320 | So we've gone from, what's up here?
00:29:21.960 | 57 milliseconds down to two.
00:29:25.960 | Now, there is one thing here.
00:29:29.720 | These values are now different.
00:29:32.280 | So the accuracy has decreased.
00:29:37.480 | So if we, where is the last one here?
00:29:39.680 | So you can see that we are getting,
00:29:44.560 | so we have the 190, we still have that one,
00:29:47.160 | and we have the 12.465,
00:29:48.960 | but these two at the front are now different.
00:29:51.640 | And this is just, it's one of the trade-offs
00:29:54.480 | of accuracy versus speed.
00:29:57.400 | So if we come down here,
00:30:00.400 | let's give that a go.
00:30:02.640 | Let's have a look at what we are pulling through.
00:30:06.240 | So I'll copy this again.
00:30:08.160 | Let's just see.
00:30:12.320 | So we have these.
00:30:14.240 | I mean, although the accuracy has decreased technically,
00:30:19.000 | because it's not getting the same results
00:30:20.560 | as the exhaustive search,
00:30:22.160 | they're still pretty good results.
00:30:23.840 | So I mean, nonetheless, I think that is pretty cool.
00:30:28.240 | So let's have a look at,
00:30:31.880 | let's compare this to our previous two other methods
00:30:36.880 | in terms of, as we did before, the graphs.
00:30:40.600 | So here is that final one.
00:30:42.680 | So we have IVF PQ along the bottom.
00:30:45.280 | Yeah, it's a lot faster, right?
00:30:48.200 | And then we have IVF flat with a end period value of 10,
00:30:54.320 | much faster than L2,
00:30:56.320 | but still not quite as fast as PQ.
00:31:01.280 | And then we have flat L2 at the top, which obviously.
00:31:04.120 | And just as well, just be aware,
00:31:06.440 | on the left here, we have a large scale.
00:31:08.960 | So the differences are pretty significant
00:31:11.800 | when we go to the 1 million mark.
00:31:13.960 | So I think that's it for this video.
00:31:19.920 | So I think obviously Pfizer's is pretty cool,
00:31:24.280 | definitely really useful,
00:31:25.320 | and I think we're definitely going to explore it more
00:31:28.800 | in the future.
00:31:29.920 | So for now, that's it.
00:31:33.400 | So thank you for watching,
00:31:34.520 | and I'll see you in the next one.