Faiss - Introduction to Similarity Search

00:00:00.920 | Hi, welcome to this video.

00:00:03.120 | We're going to be covering Facebook AI Similarity Search, or FICE.

00:00:06.760 | And we're going to be covering what FICE is

00:00:10.880 | and how we can actually begin using it.

00:00:14.480 | And we'll introduce a few of the key indexes that we can use.

00:00:18.920 | So just as a quick introduction to FICE,

00:00:24.760 | as you can probably tell from the name, it's a similarity search

00:00:28.280 | and it's a library that we can use from Facebook AI

00:00:32.880 | that allows us to compare vectors

00:00:37.240 | with a very high efficiency.

00:00:42.120 | So, if you've seen any of my videos before

00:00:46.000 | on building sentence embeddings and comparing sentence embeddings,

00:00:50.520 | in those videos, I just did a generic Python loop

00:00:54.480 | to go through and compare each embedding.

00:00:56.800 | And that's very slow.

00:00:58.760 | Now, if you're only working with maybe 100 vectors,

00:01:02.360 | it's probably OK, you can deal with that.

00:01:04.080 | But in reality, we're probably never going to be working

00:01:07.080 | with that smaller data set.

00:01:08.880 | Facebook AI Similarity Search can scale to tens,

00:01:13.040 | hundreds of thousands or up to millions and even billions.

00:01:16.880 | So this is incredibly good

00:01:22.160 | for efficient similarity search.

00:01:26.400 | But before we get into it, I'll just sort of visualize

00:01:29.560 | what this index looks like.

00:01:32.880 | So if we imagine that we have

00:01:36.960 | all of the vectors that we have created and we put it into our

00:01:41.960 | similar search index, now they could look like this.

00:01:47.000 | So this is only a three dimensional space.

00:01:50.000 | But in reality, there would be hundreds of dimensions here.

00:01:55.840 | In our use case, we're going to be using dimensions of 768.

00:02:02.160 | So, you know, there's a fair bit in there.

00:02:04.760 | Now, when we search,

00:02:11.240 | we would introduce a new vector into here.

00:02:15.200 | So let's say here, this is our query vector, so X, Q.

00:02:19.520 | Now, if we were comparing every item here,

00:02:24.520 | we would have to calculate the distance

00:02:27.120 | between every single item.

00:02:29.400 | So we would calculate for between our query vector

00:02:33.920 | and every other vector that is already in there

00:02:36.400 | in order to find the vectors which are closest to it.

00:02:38.960 | Now, we can optimize this.

00:02:42.720 | We can improve, we can decrease the number of dimensions

00:02:49.040 | in each of our vectors and do it in a intelligent way

00:02:51.800 | so they take up less space and the calculations are faster.

00:02:55.960 | And we can also restrict our search.

00:02:58.320 | So in this case, rather than comparing every single item,

00:03:01.880 | we might restrict our search to just this area here.

00:03:06.400 | And these are a few of the optimizations

00:03:10.320 | at a very high level that we can do with FICE.

00:03:13.240 | So that's enough for the introduction to FICE.

00:03:18.280 | Let's actually jump straight into the code.

00:03:21.640 | Okay, so this is our code.

00:03:23.800 | In here, this is how we are loading in

00:03:27.200 | all of our sentence embedding.

00:03:29.440 | So I've gone ahead and processed them already

00:03:31.320 | 'cause they do take a little bit of time to actually build,

00:03:35.000 | but we're building them from this file here.

00:03:38.040 | We'll load this into Python as well,

00:03:39.800 | but I mean, it's pretty straightforward.

00:03:42.960 | It's just a load of sentences

00:03:45.880 | that have been separated by a newline character.

00:03:48.400 | And then in here, we have all of those NumPy binary files.

00:03:52.560 | Now, there's NumPy binary files.

00:03:55.400 | Like I said, we're getting them from GitHub,

00:03:57.040 | which are over here.

00:03:58.480 | That's where we're pulling them all in using this cell here.

00:04:02.960 | Now, that saves everything to file.

00:04:06.720 | And then we just read in each of those files

00:04:08.920 | and we append them all into a single NumPy array here.

00:04:14.160 | And that gives us these 14.5 thousand samples.

00:04:18.520 | Each embedding is a vector with 768 values inside.

00:04:23.520 | So that's how we're loading in our data.

00:04:27.560 | I'll also load in that text file as well.

00:04:29.840 | So we just want to do with open

00:04:32.000 | sentences.text.

00:04:39.760 | And then we're just reading that in as a normal file.

00:04:43.680 | And we just write, I'm gonna put lines equals fp.read.

00:04:48.680 | And like I said, we're splitting that by newline characters.

00:04:52.400 | So we just write that.

00:04:55.120 | Sorry, it's sentences.

00:05:00.880 | And we see a few of those as well.

00:05:06.640 | Okay.

00:05:09.880 | Now, to convert from those sentences

00:05:13.200 | into those sentence embeddings,

00:05:15.440 | I need to import this anyway for later on

00:05:17.600 | when we're building our query vectors.

00:05:18.800 | I'll just show you how I do that now.

00:05:21.240 | All we do is from sentence transformers,

00:05:23.640 | which is the library we're using to create those embeddings,

00:05:27.480 | import sentence transformer.

00:05:36.720 | And then our model, we're using sentence transformer again.

00:05:40.040 | And we're using the BERT and BASE NLI mean tokens model.

00:05:45.040 | Okay.

00:05:47.920 | So that's how we initialize our model.

00:05:51.160 | And then when we're encoding our text,

00:05:52.720 | we'll see in a moment, we just write model encode,

00:05:56.680 | and then we write something in here, hello world.

00:05:59.800 | Okay, and that will encode,

00:06:01.680 | that will give us a sentence embedding.

00:06:04.400 | Okay.

00:06:05.240 | So that is what we have inside here.

00:06:09.000 | We just have the sentence embeddings

00:06:10.720 | of all of our lines here.

00:06:12.440 | Now, I think we have everything we need to get started.

00:06:19.400 | So let's build our first FICE index.

00:06:22.880 | So the first one we're gonna build

00:06:26.760 | is called the index flat L2.

00:06:31.760 | And this is a flat index,

00:06:34.120 | which means that all the vectors are just flat vectors.

00:06:38.120 | We're not modifying them in any way.

00:06:40.320 | And the L2 stands for the distance metric

00:06:44.200 | that we're using to measure the similarity of each vector

00:06:49.200 | or the proximity of each vector.

00:06:52.600 | And L2 is just Euclidean distance.

00:06:54.720 | So it's a pretty straightforward function.

00:06:59.720 | Now, to initialize that, we just write FICE.

00:07:03.520 | So we imported, no, so we need to import FICE.

00:07:07.120 | And then we write index equals FICE.index flat L2.

00:07:14.280 | And then in here, we need to pass the dimensionality

00:07:17.560 | of our vectors or our sentence embeddings.

00:07:21.240 | Now, what is our dimensionality?

00:07:23.400 | So each one is 768 values long.

00:07:28.920 | So if we'd like a nicer way of writing out,

00:07:33.920 | we put sentence embeddings and we write shape one.

00:07:39.720 | And our index requires that

00:07:46.560 | in order to be properly initialized.

00:07:49.080 | So do that.

00:07:50.280 | That will be initialized.

00:07:54.240 | Let me run it again.

00:07:55.640 | The, I think my notebook just restarted.

00:08:00.640 | It did restart, it's weird.

00:08:04.920 | Okay, one minute.

00:08:05.800 | So that's going to initialize the index.

00:08:14.360 | And there is one thing that we need to be aware of.

00:08:16.800 | So sometimes with these indexes,

00:08:19.840 | we will need to train them.

00:08:22.920 | So if the index is going to do any clustering,

00:08:26.040 | we will need to train that clustering algorithm on our data.

00:08:29.520 | And now in this case, we can check

00:08:32.520 | if an index needs training or is trained already

00:08:36.000 | using the is trained attribute.

00:08:37.840 | And we'll see with this index,

00:08:41.720 | because it's just a flat L2 index,

00:08:45.600 | it's not doing anything special.

00:08:47.960 | We'll see, because it's not doing anything special,

00:08:52.040 | we don't need to train it.

00:08:52.920 | And we can see that when we write is trained,

00:08:56.160 | it says it's already trained,

00:08:57.440 | just means that we don't actually need to train it.

00:08:59.560 | So that's good.

00:09:01.320 | Now, how do we add our vectors, our sentence embeddings?

00:09:07.640 | All we need to do is write index, add,

00:09:09.960 | and then we just add embeddings like so.

00:09:12.520 | So pretty straightforward.

00:09:14.760 | So add sentence embeddings.

00:09:18.840 | And then from there,

00:09:20.840 | we can check that they've been added properly

00:09:24.280 | by looking at the end total value.

00:09:26.440 | So this is number of embeddings or vectors

00:09:29.240 | that we have in our index.

00:09:31.080 | And with that, we can go ahead and start querying.

00:09:34.560 | So let's first create a query.

00:09:37.480 | So we'll do XQ, which is our query vector.

00:09:42.000 | And we want to do the model and code that we did before.

00:09:45.320 | Now, I'm going to write someone sprints with a football.

00:09:50.320 | Okay.

00:09:53.760 | That's going to be our query vector.

00:09:57.600 | And to search, we do this.

00:10:00.280 | So we write DI equals index, search, XQ.

00:10:05.280 | And then in here, we need to add K as well.

00:10:09.720 | So K, let me define it above here.

00:10:14.240 | So K is the number of items or vectors,

00:10:19.080 | similar vectors that we'd like to return.

00:10:20.800 | So I'm going to want to return four.

00:10:23.120 | So with here, with this,

00:10:25.720 | we will return four index IDs into this I variable here.

00:10:30.720 | I'm going to time it as well,

00:10:33.600 | just so you see how long it takes.

00:10:35.320 | And let's print I.

00:10:42.200 | You can see that we get these four items.

00:10:44.960 | Now, these align to our lines.

00:10:48.520 | So the text that we have up here, that will align.

00:10:52.080 | So what we can do is we can print all of those out.

00:10:56.480 | So let's do I.

00:10:58.360 | And then in here, we want to write lines I for I.

00:11:05.120 | Sorry, let me end that.

00:11:11.440 | I for I in I.

00:11:15.840 | Okay.

00:11:17.560 | Ah, sorry.

00:11:22.560 | So this is zero here.

00:11:24.880 | Okay, so these are the sentences

00:11:27.760 | or the similar sentences that we got back.

00:11:29.280 | And we see, obviously, it seems to be working pretty well.

00:11:31.520 | All of them talking about football

00:11:33.320 | or being on a football field.

00:11:36.160 | So that looks pretty good, right?

00:11:40.240 | The only problem is that this takes a long time.

00:11:42.720 | We don't have that many vectors in there.

00:11:45.080 | And it took 57.4 milliseconds.

00:11:48.400 | So it's a little bit long

00:11:51.520 | and something that we can actually improve.

00:11:55.320 | Okay, so before we move on to the next index,

00:12:00.560 | I just want to have a look at the sort of speed

00:12:02.960 | that we would expect from this when we are,

00:12:05.640 | this is a very small data set.

00:12:06.880 | So what else could we expect?

00:12:10.760 | So if we go over here, I've already written all this code.

00:12:14.640 | If you'd like to go through this notebook,

00:12:16.560 | I'll leave a link in the description.

00:12:18.480 | So come down here, we have this flat L2 index,

00:12:23.560 | and this is the query time.

00:12:25.120 | So this is for a randomly generated vector

00:12:28.400 | with a dimension size of 100.

00:12:30.320 | And this is a number of vectors within that index.

00:12:35.320 | So we go up to 1 million here.

00:12:38.320 | And this is a query time in milliseconds.

00:12:40.880 | You can see, you know, it increases quite quickly.

00:12:44.560 | Now this is in FICE, but it's still an exhaustive search.

00:12:47.960 | We're not really optimizing how we could do.

00:12:52.040 | We're not using that approximate search capabilities

00:12:55.240 | of FICE.

00:12:56.880 | So if we switch back over to FICE,

00:13:00.280 | we can begin using that approximate search

00:13:05.160 | by adding partitioning into our index.

00:13:09.440 | Now, the most popular of these uses a technique

00:13:13.280 | very similar to something called Voronoi cells.

00:13:17.520 | I'm not sure how you pronounce it.

00:13:18.720 | I think that's about right.

00:13:20.960 | And I can show you what that looks like.

00:13:25.600 | So over here,

00:13:27.160 | if we go here, we have all of these.

00:13:32.680 | So this is called a Voronoi diagram.

00:13:37.680 | And each of the sort of squares

00:13:41.800 | or the cells that you see are called Voronoi cells.

00:13:44.640 | So here we have Voronoi cells.

00:13:49.640 | And that is just what you see here.

00:13:54.960 | So this, this, all of these kind of squares

00:13:59.680 | are each a cell.

00:14:01.800 | Now, as well as those, we also have our centroids.

00:14:05.200 | So I'm just gonna write this out instead.

00:14:07.680 | So centroids.

00:14:08.960 | And these are simply the centers of those cells.

00:14:13.320 | Now, when we introduce a new vector

00:14:17.880 | or our query vector into this,

00:14:21.360 | what we're doing is essentially,

00:14:23.200 | so we have our query vector,

00:14:25.040 | and let's say it appears here.

00:14:29.840 | Now, within each one of these cells,

00:14:32.000 | we actually have a lot of other vectors.

00:14:35.160 | So we could have, you know,

00:14:36.880 | we could have millions in each cell.

00:14:38.720 | So there's a lot in there.

00:14:42.840 | And if we were to compare that query vector,

00:14:45.520 | this thing here, to every single one of those vectors,

00:14:50.360 | it would obviously take a long time.

00:14:52.160 | We're going through every single one.

00:14:54.160 | We don't want to do that.

00:14:55.320 | So what this approach allows us to do

00:14:58.440 | is instead of checking against every one of those vectors,

00:15:01.360 | we just check it against every centroid.

00:15:03.800 | And once we figure out which centroid is the closest,

00:15:08.360 | we limit our search scope to only vectors

00:15:14.000 | that are within that centroid Voronoi cell.

00:15:18.560 | So in this case, it would probably be this centroid here,

00:15:23.000 | which is the closest.

00:15:24.480 | And then we would just limit our search

00:15:27.280 | to only be within these boundaries.

00:15:31.400 | Now, what we might find is maybe there's,

00:15:34.440 | the closest vector here is actually here,

00:15:37.360 | whereas the closest vector here is right there.

00:15:40.040 | So in reality, this vector here, this one,

00:15:43.640 | might actually be a better approximation or a better,

00:15:49.680 | it might be more similar to our query.

00:15:53.480 | And that's why this is approximate search,

00:15:56.400 | not exhaustive search,

00:15:58.000 | because we might miss out on something,

00:16:02.040 | but that is kind of outweighed

00:16:04.560 | by the fact that this is just a lot, a lot faster.

00:16:08.800 | So it's sort of pros and cons.

00:16:11.040 | It's whatever is going to work best for your use case.

00:16:15.560 | Now, if we want to implement that in code,

00:16:19.600 | first thing that we want to do

00:16:20.760 | is define how many of those cells we would like.

00:16:25.400 | So I'm going to go 50.

00:16:27.440 | So use this endless parameter.

00:16:29.680 | And then from there, we can set up our quantizer,

00:16:33.000 | which is almost, it's like another step in the process.

00:16:37.400 | So with our index,

00:16:41.840 | we are still going to be measuring the L2 distance.

00:16:44.560 | So we still actually need that index in there.

00:16:48.200 | So to do that, we need to write FICE index flat L2,

00:16:53.200 | and we pass out dimensions again,

00:16:58.560 | just like we did before.

00:16:59.760 | And like I said, that's just a step in the process.

00:17:04.120 | That's not our full index.

00:17:06.080 | Our full index is going to look like this.

00:17:09.080 | So we write index.

00:17:11.000 | And in here, we're going to have our FICE,

00:17:13.120 | and this is a new index.

00:17:14.640 | So this is the one that is creating those partitions.

00:17:16.920 | So we write index IVF flat.

00:17:21.920 | And in there, we need to pass our quantizer,

00:17:28.080 | the dimensions, and also the end list.

00:17:35.440 | Okay, now, if you remember what I said before,

00:17:41.520 | we, in some cases, we'll need to train our index.

00:17:46.120 | Now, this is an example of one of those times.

00:17:49.520 | Because we're doing the clustering

00:17:51.800 | and creating those foreign noise cells,

00:17:54.640 | we do need to train it.

00:17:56.040 | And we can see that because this is false.

00:17:58.320 | Now, to train it, we need to just write index train,

00:18:03.320 | and then in here,

00:18:09.360 | we want to pass all of our sentence embeddings.

00:18:12.960 | So sentence embeddings, like so.

00:18:17.960 | Let's run that.

00:18:18.920 | It's very quick.

00:18:19.920 | And then we can write, it's trained.

00:18:22.960 | And we see that's true.

00:18:23.800 | So now our index is essentially ready to receive our data.

00:18:28.800 | So we do this exactly the same way as we did before.

00:18:35.320 | We write index add,

00:18:37.280 | and we pass our sentence embeddings again.

00:18:41.560 | And we can check that everything is in there

00:18:43.320 | with index and total.

00:18:46.000 | Okay, so now we see that we have our index, it's ready,

00:18:49.600 | and we can begin querying it.

00:18:51.800 | So what I'm going to do is use the exact same query vector

00:18:56.800 | that we used before.

00:18:59.080 | Going to time it so that we can see how quick this is

00:19:02.600 | compared to our previous query.

00:19:04.680 | And we're actually going to write

00:19:08.720 | the exact same thing we wrote before.

00:19:10.640 | So can I actually just copy it?

00:19:14.480 | So I'll take that, bring it here.

00:19:24.080 | There we go.

00:19:26.120 | So now let's have a look.

00:19:27.920 | So total 7.22.

00:19:30.600 | So bring it up here, and we have 57.4.

00:19:33.760 | Now this is maybe a little bit slow.

00:19:36.960 | So we'll see that the times do vary a little bit

00:19:41.920 | quite randomly, but maybe that's a little bit slow,

00:19:46.920 | but it's probably pretty realistic.

00:19:49.640 | So that took 57 milliseconds.

00:19:54.320 | This one, seven.

00:19:56.640 | Now let's have a look.

00:19:57.520 | So these are the indexes we've got.

00:19:59.320 | Let's compare them to what we had before.

00:20:02.720 | And I believe they're all the same.

00:20:06.080 | So we've just shortened the time by a lot

00:20:10.360 | and we're getting the exact same results.

00:20:13.240 | So that's pretty good.

00:20:14.200 | Now, sometimes we will find

00:20:17.240 | that we do get different results.

00:20:18.320 | And a lot of the time that's fine,

00:20:20.520 | but maybe if you find the results are not that great

00:20:25.520 | when you add this sort of index,

00:20:27.760 | then that just means that this search

00:20:30.240 | is not exhaustive enough.

00:20:32.680 | Like we are using approximate search,

00:20:34.200 | but maybe we should approximate a little bit less

00:20:36.760 | and be slightly more exhaustive.

00:20:39.120 | And we can do that by setting the nProbe value.

00:20:43.400 | So nProbe, let's explain in a minute.

00:20:48.400 | So let me actually first just run this

00:20:52.400 | and we can see it will probably take slightly longer.

00:20:56.840 | So yeah, we get 15 milliseconds here.

00:20:59.840 | Of course, we get the same results again

00:21:02.680 | 'cause there were no accuracy issues here anyway.

00:21:06.320 | But let me just explain what that is actually doing.

00:21:10.680 | So in this case here,

00:21:13.200 | what you can see is a IVF search

00:21:18.160 | where we are using an nProbe value of one.

00:21:21.720 | So we're just searching one cell

00:21:26.320 | based on the first nearest centroid to our query vector.

00:21:30.560 | Now, if we increase this up to eight,

00:21:33.920 | or let's use a smaller number in this example.

00:21:37.720 | So maybe we increase it to four,

00:21:41.200 | our four nearest centroids.

00:21:44.400 | So I would say probably these,

00:21:48.280 | this one, this one, this one,

00:21:52.680 | and the one we've already highlighted.

00:21:54.560 | All of those would now be in scope

00:21:58.840 | because our nProbe value,

00:22:00.200 | so the number of cells that we are going to search is four.

00:22:04.000 | Now, if we increase it again to say six,

00:22:07.960 | these two cells might also be included.

00:22:10.440 | Now, of course, when we do that,

00:22:12.960 | we are searching more,

00:22:14.800 | so we might get a better performance, better accuracy.

00:22:18.840 | But in terms of performance in time,

00:22:23.320 | it's also not, it's also going to increase

00:22:27.200 | and we don't want time to increase.

00:22:30.920 | So there's a trade-off between those two.

00:22:34.080 | In our case, we don't really need to increase this,

00:22:36.520 | so don't really need to worry about it.

00:22:41.280 | So that is the index IVF.

00:22:47.680 | And we have one more that I want to look at,

00:22:53.360 | and that is the product quantization index.

00:22:57.640 | So this is actually, so we use IVF,

00:23:02.640 | and then we also use product quantization.

00:23:05.640 | So it's probably better if I try and draw this out.

00:23:10.640 | So when we use product quantization,

00:23:15.880 | imagine we have one vector here.

00:23:19.080 | So this is our vector.

00:23:21.960 | Now, the first step in product quantization

00:23:25.360 | is to split this into sub-vectors.

00:23:28.440 | So we split this into several and then we take them out.

00:23:32.120 | We pull these out and they are now

00:23:36.160 | their own sort of mini-vectors.

00:23:38.640 | And this is just one vector that I'm visualizing here,

00:23:43.640 | but we would obviously do this with many, many vectors.

00:23:48.000 | So there would be many, many more.

00:23:51.360 | So in our case, that's just under 15,000.

00:23:56.360 | Now, that means that we have a lot of these sub-vectors.

00:24:01.400 | And what we do with these is we run them

00:24:10.520 | through their own clustering algorithm.

00:24:12.880 | So what we do is we end up getting clusters

00:24:16.400 | and each of those clusters is going to have a centroid.

00:24:21.400 | So this one would also be run through one.

00:24:23.600 | So each subset of vector slices

00:24:27.840 | is going to be run through its own clustering algorithm

00:24:32.840 | creating these centroids.

00:24:35.080 | And these centroids are smaller in size

00:24:38.920 | than the original sub-vectors here.

00:24:42.200 | And what we do is for each of these sub-vectors,

00:24:47.200 | so each of these sub-vectors, they get pulled into here.

00:24:52.160 | So maybe this one is here

00:24:55.200 | and it gets assigned to its nearest centroid.

00:24:58.560 | And then we take that assignment

00:25:02.660 | all the way back over here and add it into our vector.

00:25:07.660 | So this is centroid three.

00:25:12.480 | For example, and when I say assign it back,

00:25:17.160 | it's probably the wrong way to think about it.

00:25:19.920 | Maybe it's more like this.

00:25:22.200 | So it becomes a new vector

00:25:25.400 | built from those centroid IDs.

00:25:31.800 | Okay, so this would be three.

00:25:34.120 | Now, what that does is essentially reduces

00:25:38.320 | the size of our vectors, but pretty significantly,

00:25:43.260 | depending on what dimensions we use there.

00:25:47.220 | So, so you're back to the code.

00:25:52.360 | Let's implement that.

00:25:53.460 | Now we need to define two new variables here.

00:25:58.440 | So M, which is going to be the number of centroids

00:26:02.000 | in the final vector.

00:26:04.000 | So now one thing that we do need to know with M

00:26:08.640 | is that M must be,

00:26:12.240 | we must be able to multiply M into D.

00:26:15.320 | So what is our D value?

00:26:17.880 | It's 100, I can't remember.

00:26:20.580 | Where are we?

00:26:23.420 | Let me check.

00:26:25.440 | So 768.

00:26:30.040 | Now we should be able to divide that into eight, I think.

00:26:33.920 | Yeah, so this is good.

00:26:36.800 | We can use eight for M,

00:26:38.600 | but we couldn't use something like five

00:26:40.720 | because if we do five, we see that D doesn't fit.

00:26:45.480 | So five doesn't fit nicely into D, whereas eight does.

00:26:50.180 | So M or D must be a multiple of M.

00:26:55.180 | Otherwise we're going to get an error.

00:26:59.640 | And that's because of the way that those vectors

00:27:04.440 | are broken down into the final centroid ID vectors.

00:27:09.440 | And we also need to specify the number of bits

00:27:13.220 | within each of those centroids.

00:27:15.320 | So this value, we can use what we want.

00:27:17.800 | I'm going to use eight.

00:27:19.680 | And then we can set up our index and also the quantizer.

00:27:24.160 | So we use a quantizer as we did before.

00:27:26.640 | So the quantizer is going to be FICE.index Flat L2 D.

00:27:31.640 | And also our index here is going to be,

00:27:40.640 | so this is a new one.

00:27:42.600 | This is index IVFPQ.

00:27:45.160 | So it's not a flat vector anymore, which is the full vector.

00:27:49.240 | It's a quantized vector.

00:27:52.080 | So where we have reduced the size of it through this,

00:27:57.080 | through the method I explained before, where we drew out.

00:28:00.800 | Now we need to pass a few arguments into here.

00:28:07.200 | First one is the quantizer.

00:28:10.960 | So the quantizer D, which is our own dimensionality,

00:28:15.920 | and list M and bits.

00:28:20.840 | So pass all of those to our index.

00:28:23.760 | Sorry, we need to put FICE there as well.

00:28:28.080 | And there we go.

00:28:31.720 | So we now have our index again.

00:28:34.400 | You may have guessed that we might need to train this one.

00:28:37.680 | There we go.

00:28:40.840 | So to train it, we'll just write index.train.

00:28:44.400 | That's our sentence embeddings.

00:28:49.440 | Okay, so it might take a little bit longer this time.

00:28:52.480 | There we go.

00:28:53.360 | And then we can add our vectors.

00:28:55.360 | Now, after adding those,

00:28:57.920 | let's see how quick this is.

00:29:00.520 | Should be a lot quicker, or a fair bit quicker.

00:29:03.040 | It's hard to get much quicker than the last one.

00:29:05.480 | So we're going to use the same code as before.

00:29:08.120 | So I'm going to take this right down here.

00:29:12.880 | See, 2.86.

00:29:15.040 | So we've gotten a lot faster.

00:29:18.320 | So we've gone from, what's up here?

00:29:21.960 | 57 milliseconds down to two.

00:29:25.960 | Now, there is one thing here.

00:29:29.720 | These values are now different.

00:29:32.280 | So the accuracy has decreased.

00:29:37.480 | So if we, where is the last one here?

00:29:39.680 | So you can see that we are getting,

00:29:44.560 | so we have the 190, we still have that one,

00:29:47.160 | and we have the 12.465,

00:29:48.960 | but these two at the front are now different.

00:29:51.640 | And this is just, it's one of the trade-offs

00:29:54.480 | of accuracy versus speed.

00:29:57.400 | So if we come down here,

00:30:00.400 | let's give that a go.

00:30:02.640 | Let's have a look at what we are pulling through.

00:30:06.240 | So I'll copy this again.

00:30:08.160 | Let's just see.

00:30:12.320 | So we have these.

00:30:14.240 | I mean, although the accuracy has decreased technically,

00:30:19.000 | because it's not getting the same results

00:30:20.560 | as the exhaustive search,

00:30:22.160 | they're still pretty good results.

00:30:23.840 | So I mean, nonetheless, I think that is pretty cool.

00:30:28.240 | So let's have a look at,

00:30:31.880 | let's compare this to our previous two other methods

00:30:36.880 | in terms of, as we did before, the graphs.

00:30:40.600 | So here is that final one.

00:30:42.680 | So we have IVF PQ along the bottom.

00:30:45.280 | Yeah, it's a lot faster, right?

00:30:48.200 | And then we have IVF flat with a end period value of 10,

00:30:54.320 | much faster than L2,

00:30:56.320 | but still not quite as fast as PQ.

00:31:01.280 | And then we have flat L2 at the top, which obviously.

00:31:04.120 | And just as well, just be aware,

00:31:06.440 | on the left here, we have a large scale.

00:31:08.960 | So the differences are pretty significant

00:31:11.800 | when we go to the 1 million mark.

00:31:13.960 | So I think that's it for this video.

00:31:19.920 | So I think obviously Pfizer's is pretty cool,

00:31:24.280 | definitely really useful,

00:31:25.320 | and I think we're definitely going to explore it more

00:31:28.800 | in the future.

00:31:29.920 | So for now, that's it.

00:31:33.400 | So thank you for watching,

00:31:34.520 | and I'll see you in the next one.

Faiss - Introduction to Similarity Search

Chapters