back to index

GUI-based Few Shot Classification Model Trainer | Demo


Chapters

0:0 Intro
1:14 Classification
2:49 Better Classifier Training
6:33 Classification as Vector Search
8:47 How Fine-tuning Works
10:50 Identifying Important Samples
12:39 CODE IMPLEMENTATION
13:13 Indexing
18:59 Fine-tuning the Classifier
27:37 Classifier Predictions
30:43 Closing Notes

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today we're going to talk about a more effective way of training classification models. Nowadays
00:00:06.400 | pre-trained models dominate the field of machine learning. There are very few ML projects that
00:00:13.040 | start with us actually training a model from scratch. Instead we usually start by looking
00:00:19.920 | for an off-the-shelf pre-trained model. Whether that pre-trained model is from an online platform
00:00:26.960 | like PyTorch Hub or HuggingFace Hub or from our own internal already trained in-house models.
00:00:34.960 | The ecosystem of these pre-trained models whether external or internal has allowed us to push the
00:00:42.480 | limits of what is possible in machine learning. This doesn't mean however that everything is
00:00:48.320 | super easy and everything works all the time. There are always going to be some challenges.
00:00:54.640 | Fortunately we're able to tackle a lot of these problems that are actually shared across a huge
00:01:00.960 | number of pre-trained models because they tend to have similar points of failure. One of those
00:01:07.120 | is the excessive compute and data needed to actually fine-tune one of these models. Now
00:01:14.720 | focusing on classification a very typical scenario that we have is that we have some model some big
00:01:22.640 | model like BERT or T5 and what we want to do is fine-tune this model for classification. Now one
00:01:30.400 | way we can do that is we add a simple linear layer onto the end of it and then we fine-tune that
00:01:37.040 | linear layer. Now what I want us to focus on here is the model that comes before doesn't really
00:01:43.200 | matter. We only really care about this linear layer. We can actually fine-tune that for a lot
00:01:48.400 | of different use cases without even touching the model weights of the big pre-trained model that
00:01:54.160 | comes before it. It's a classification layer that is actually producing the final prediction and
00:02:00.480 | because of this that classification layer can become the single point of failure in producing
00:02:07.040 | our predictions. So we focus on fine-tuning that classification layer and a common approach to
00:02:12.800 | doing this might look a little bit like this. First we have to collect a data set that focuses
00:02:19.120 | on enabling this model to adapt to a new domain or just dealing with data drift. Then we have to
00:02:27.280 | slog through the data set and if it's going to work well it's usually a large data set labeling
00:02:33.440 | the records as per their classification and then once all records have been labeled we have to
00:02:40.000 | fine-tune the classifier. This approach works but it really is not efficient. There's actually a much
00:02:47.760 | better way of doing this. What we need to do is focus our fine-tuning efforts on the essential
00:02:54.720 | records that actually matter. Otherwise we're wasting time, our own time, and compute on
00:03:01.680 | annotating and fine-tuning across the entire data set when the vast majority of the data in the
00:03:08.160 | data set probably doesn't matter. So now the question is how do we decide which samples are
00:03:15.440 | actually essential and which are not? Well that's where we can use vector search. We can use vector
00:03:21.520 | search to search through our data set before we even annotate everything and identify the records
00:03:29.680 | that are going to make the biggest impact on our model performance. Meaning we save our time and a
00:03:35.600 | lot of compute by just skipping the non-essential records. Some of you may be thinking what does
00:03:41.440 | vector search have to do with training a classification model? Well it's actually
00:03:47.440 | super important. Many state-of-the-art models are available as pre-trained models. Those are models
00:03:55.280 | like BERT, T5, EpsilonNet, OpenAI's CLIP. These models use an insane number of parameters and
00:04:06.400 | perform a lot of complex operations. Yet when applied to classification we're actually relying
00:04:14.880 | on the final layers that are added onto the end of these huge models. So we might have
00:04:22.640 | some simple feed forward layers or just a linear classification layer. Now the reason for this is
00:04:29.440 | that these models they're not being trained to produce class predictions. We can think of them
00:04:38.000 | as actually being trained to make vector embeddings. So we pre-train these big models
00:04:44.400 | and the idea is that after pre-training these models will produce these very information rich
00:04:53.440 | vector embeddings. And then what we do for different tasks is that we add an extra task
00:05:01.760 | specific head onto the end of that. And that task specific head is taking that vector embedding or
00:05:08.160 | vector embeddings from the model and running them through a smaller network. Like I said it can just
00:05:15.920 | be a linear layer and outputting something else. Outputting those predictions. So the power of
00:05:22.560 | these models is not that they can do classification, question answering, all these different
00:05:29.840 | things. The power of these models is that they produce these very information rich vectors that
00:05:35.600 | then smaller simpler models can use to do these tasks of question answering, classification and so
00:05:42.320 | on. These vectors that these models are producing are simply full of useful information that have
00:05:50.160 | been encoded into a vector space. Okay so you can imagine in this vector space, imagine a 2D space,
00:05:57.280 | we have vector A here, vector B here. Those two are very close to each other
00:06:04.240 | and therefore they share some sort of similar meaning. Whereas vector C over here is very far
00:06:10.640 | away from from A and B. Therefore it shares less meaning with A and B. Now the result of this is
00:06:18.640 | that these models are essentially creating a map of information. Using this map they're able to
00:06:24.560 | consume data like images or tech and output these useful information rich representations with
00:06:32.080 | vectors. So our task in classification now is not to consume data and try and abstract
00:06:42.880 | different meaning from that and classify that abstraction of meaning. In reality the abstraction
00:06:50.080 | of meaning is already handled by the big models. Instead our task with classification is to teach
00:06:57.920 | a smaller model to identify the different regions within that map or the vector space. Now a typical
00:07:06.000 | architecture that we will see for classification is a pre-trained model followed by a linear layer.
00:07:12.400 | Now we can think of the internal weights of this classifier as actually being a vector within the
00:07:20.640 | wider vector space. And Ido Liberty, the founder and CEO of Pinecone and past head of Amazon AI
00:07:29.760 | Labs explained to me that we can actually use this fact and couple it with vector search in order to
00:07:37.360 | massively optimize the learning process for our classifier. So what we need to do is really
00:07:45.600 | imagine this problem as being within a vector space or a map. We have the internal model weights
00:07:51.600 | w and we have all these vectors that as of yet are unannotated and we haven't fine-tuned on them yet.
00:07:58.560 | We want to calculate the dot products between w and x. If they share a positive direction
00:08:04.560 | they will have a positive value and they produce a negative score if the directions are opposite.
00:08:11.440 | Now there is just one problem with dot product here. It considers both direction and magnitude
00:08:16.720 | which means that if we have a vector x that has a larger magnitude than another vector x even if
00:08:24.800 | that other vector is actually the same vector as our model weights or very similar it can actually
00:08:31.520 | output a larger dot product score. So what we need to do is normalize all these vectors that
00:08:37.440 | we're comparing. This simply removes the magnitude problem and makes it that we are comparing only
00:08:44.960 | the direction of the vectors. Now when we fine-tune the linear classifier with these vectors
00:08:52.800 | it's going to learn to align itself with vectors that we label as positives and move away from
00:09:00.800 | vectors we label as negatives. Now this will work really well but there are still some improvements
00:09:08.160 | that we could add in here. First imagine we return only irrelevant samples in a single training batch.
00:09:16.240 | They will all be marked as negative one and the classifier knows to move away from these values
00:09:21.280 | but it doesn't know in which direction. Okay and especially in a high dimensional space there are a
00:09:25.840 | lot of directions that the classifier can move in. So this is problematic because it means that the
00:09:31.360 | classifier is just going to be moving at random away from those negative vectors. Another problem
00:09:37.280 | is that many labels be more or less relevant. So imagine we had the query dogs in the snow and then
00:09:45.280 | we had two pieces of text a dog and a dog in the snow. Both of those are relevant depending on what
00:09:53.840 | you're looking at but a dog in the snow is more relevant. These two pieces of text are not equally
00:10:01.600 | relevant but at the moment all we can do is label one as negative one as positive or both as
00:10:09.520 | positives and that's not really ideal because it doesn't really show the full picture of both of
00:10:16.720 | these are relevant just one is more than the other. So what we need is almost like a gradient of
00:10:21.920 | relevance. We need a continuous range from negative e.g. minus one to positive e.g. plus one. Even if
00:10:29.680 | we just have a range from negative one to negative 0.8 there's still a direction that the model can
00:10:37.440 | figure out from that range of values. So all of this together just allows our linear classifier
00:10:44.720 | to learn where to place itself within the vector space produced by the model layers preceding it.
00:10:50.560 | Now that describes a fine-tuning process but we can't do this across our entire data set.
00:10:57.520 | If we have like a big data set which we probably do it would take too much time annotating everything
00:11:03.920 | and it would be a waste of our time as well. To do this efficiently what we must do is capitalize
00:11:10.480 | on the idea of identifying relevant versus irrelevant vectors within a proximity of the
00:11:17.040 | model's learned weights w. So we focus our efforts on the specific area that is actually going to be
00:11:24.560 | helpful. For an already trained classifier those are going to be the false positives and false
00:11:29.360 | negatives predicted by the classifier. However we also usually don't have a list of false negatives
00:11:37.680 | and false positives but we do know that the solvable errors will be present near the classifier's
00:11:44.560 | decision boundary e.g. the line that separates the positive predictions from negative predictions.
00:11:50.640 | So we use vector search in order to actually pull in the high proximity samples that are most
00:11:58.000 | similar to the model weights w. We then label those vectors and use them for training our model.
00:12:05.200 | The model optimizes those internal weights w. We extract them again and then we perform a vector
00:12:11.200 | search with them again and we just keep repeating this process over and over again until the linear
00:12:16.960 | classifier has been optimized and is producing the correct predictions that we need. So by focusing
00:12:24.000 | annotation and training on these essential samples we avoid wasting time and compute on
00:12:30.560 | those vectors that don't make as much of a difference. Okay so all of that is the general
00:12:36.640 | idea behind this process. Now let's have a look at how we can put all that together and fine-tune a
00:12:43.200 | classifier with vector search. Now we will see that there are two parts to the training process.
00:12:49.200 | First we need to index our data so that is where we embed everything using the preceding model
00:12:55.680 | layers e.g. BERT or CLIP or so on and then store those in a vector database and then step two is
00:13:01.920 | that we actually fine-tune the classifier. So query with model weights w, return the most similar
00:13:08.240 | records, annotate them and then use them to fine-tune the classifier. So let's go ahead and
00:13:14.800 | start with indexing. Given a data set of images or other formats we first need to process everything
00:13:22.240 | through the big model preceding our linear classifier to create the vector embeddings.
00:13:28.320 | For our example we're going to use a model called CLIP that's capable of understanding both
00:13:32.960 | text and images and it has been trained on text image pairs and has learned how to encode them
00:13:41.120 | into as similar vector space as possible. So what we're going to need to start with before indexing
00:13:47.200 | anything is initializing a data set that we can then encode with CLIP. So we're going to use this
00:13:54.160 | data set from Hugging Face datasets hub. So we can pip install everything we're going to need for
00:14:00.000 | this here. We're taking the train split and that contains 9.5 000 images. Some of those are radios
00:14:08.560 | like you can see here, there's pictures of dogs, trucks and a few other things. And we can see
00:14:16.000 | an array of one of those images right there. Now it's not so important for what we're doing here.
00:14:22.640 | What we do want to do is actually initialize both the model and the pre-processing steps
00:14:30.160 | before the data is being fed into the model. So we do that here. So initialize the model CLIP
00:14:35.600 | using this model ID here. Okay so this is one version the CLIP model. And then the pre-processor
00:14:43.920 | will just take images and process them so that CLIP can read them. Okay as all we're doing here
00:14:50.160 | we're going to go through all of these steps. This is the pre-processing and from that we get
00:14:57.040 | the image features. Those image features are a vector representation of the image. So in this
00:15:05.680 | case we've done the Sony radio image and that gives us a 512 dimensional vector embedding.
00:15:15.840 | The embeddings from CLIP are not normalized. Okay so we're going to be using dot product both
00:15:22.000 | within the model and during our vector search. So we should really normalize these. So we do that
00:15:28.720 | here and then we see that these values are all between the values of negative one to two plus
00:15:36.880 | one. Now that's how we embed or create a vector embedding for a single item. But we're going to
00:15:43.680 | want to do for loads of items and we're also going to want to index them and store them inside a
00:15:48.160 | vector database. So we're going to use Pinecone for this. You may need to sign up for a free API
00:15:54.560 | key if you haven't already. And what we do is initialize our connection to Pinecone here.
00:16:00.720 | You just put your API key here. It's all free. And then we create an index. Now it's important
00:16:07.040 | that we have a few things here. So the index name that doesn't actually matter. Okay you can put
00:16:11.600 | whatever you want. But what you do need is the correct dimensionality. So that is the 512 that
00:16:18.800 | you saw up here. That is what we put in here. We do need to make sure that we're using dot product
00:16:26.080 | similarity. And we're going to also include this metadata config. So basically when once we see an
00:16:33.120 | image and we label it we're going to tell Pinecone we don't want to return that image again. Okay so
00:16:38.560 | that we can go through and not over optimize on like 10 images. And then we connect to the index
00:16:46.880 | after we have created it there. Now to add that single feature embedding that we just created,
00:16:55.120 | that image embedding we just created, we would do this. Okay so we have an ID and then we just
00:17:00.800 | convert the embedding into a list format and we just upsert. So with that we have one embedding
00:17:09.680 | within our vector index. But of course we want to have our full data set in there so we can search
00:17:16.000 | for it and add data and so on. So to do that we're going to use this loop here. I'm not going to go
00:17:23.440 | through because it's literally what we've just done. Okay the only thing I think I've added here
00:17:27.760 | is this which is checking for grayscale versus RGB images. But the rest of this is exactly the same.
00:17:36.880 | Okay we're just going we're doing it all at a larger scale and we're also adding in the metadata
00:17:42.240 | here. Okay so that's seen. We're setting it to zero for all the images to start with and then
00:17:46.640 | we'll set it to one once we've seen a set of images. Mark them as you know positive or negative
00:17:53.600 | and train with them. Then we set that seen value to one so we don't return it again.
00:17:58.240 | Okay so we have this this radio. Let's have a quick look at how we might query.
00:18:05.840 | So we create our query vector xq here which is just we're doing the same thing again as what
00:18:13.840 | we did before. Normalizing it and then we query with it. Okay and that returns these items here
00:18:20.880 | from Pinecone. Let's have a look at what they look like. So the first one is obviously that radio.
00:18:25.600 | That radio is the most similar of the vector. So naturally that would be the first thing that
00:18:30.000 | gets returned. Okay next one we have a car radio. We have another Sony radio. I think it's even the
00:18:36.640 | same model. And another Sony radio which is also the same model. It seems so. And then just another
00:18:45.600 | radio. It's very similar. So clearly those embeddings are pretty good from Clip. But now
00:18:51.360 | what we want to do is fine-tune a linear classifier on top of that to classify these different images.
00:18:58.960 | Okay so to do that I'm going to start from scratch. So this is a new notebook. You can find all the
00:19:04.400 | links to these notebooks by the way in the video description or if you're watching this on the
00:19:09.360 | article down at the bottom of the article in the resources section. So here initialize the connection
00:19:15.600 | to the index again. You don't need to do this if you just ran through the last bit of code. You can
00:19:19.600 | just keep that as it is and maintain your connection to the index. Again we're going to
00:19:26.240 | load the data set and again you don't need to do that if you've already done it. Initialize the
00:19:33.760 | model. So Clip and the processor. So there's one thing different here and you can actually
00:19:40.480 | tokenize using the other preprocessor. But for the sake of covering everything I'm just showing you
00:19:46.080 | how to do with the Clip tokenizer fast here as well. So here we're initializing just the tokenizer
00:19:52.160 | side of the Clip preprocessor. And we're setting up this prompt. So dogs in the snow. We tokenize
00:19:58.960 | them to get a set of token IDs and then we use the model get text features method in order to
00:20:06.720 | get a vector embedding of that text, of that dogs in the snow prompt. Okay and we come down here. We
00:20:15.280 | create the query vector from that and we're just going to retrieve top 10 most similar records and
00:20:22.400 | store them in in XC. So it's just like the contents. So there's a few things in XC here.
00:20:29.360 | We actually don't need all of this. So what we want is the IDs and then the values as well.
00:20:37.920 | So first we get the IDs then we get the values. Okay and we can see why it's returned. So dogs
00:20:45.840 | in the snow. Right this one is not a dog in the snow but you can kind of see where it's a bit
00:20:49.360 | confused. The sand in the background does look kind of white and snowy. But then the rest of
00:20:55.760 | these yeah they're dogs in the snow other than this one. So it's returning the right thing here
00:21:03.440 | but let's say we don't want dogs in the snow. Okay let's say we want to adjust this to something
00:21:09.680 | slightly different. Like for example dogs at dog shows and we'll go through this. So this code here
00:21:17.040 | not really that important. All this is is a little interface that I built within Jupiter so that we
00:21:23.760 | can sort of quickly go through and label the images. So I would run this. Okay I'm not going
00:21:33.040 | to run it again. So I'll just run this here and basically what it's going to do is it's going to
00:21:38.240 | show an image. So example this one here it's going to show the image and say okay what you rate this
00:21:43.360 | from negative one to one. And you just go through you say you know what you what you would rate it.
00:21:49.280 | And then that will give you or that will basically produce a dictionary that maps these ideas to the
00:21:56.240 | score that you gave it. So you can see all the scores I gave last time I ran this. And you can
00:22:02.640 | just double check that the ideas and scores are aligned here. Yes they are so you don't need to
00:22:07.840 | worry so much about that. And all we do is we need to get the values which are going to be the inputs
00:22:14.320 | of training data for the linear classifier. And then we get the labels okay so the scores.
00:22:18.880 | So we go through and what we're going to do here is just initialize a PyTorch linear classifier
00:22:27.200 | layer. And what I do first is so in most cases I imagine that we're going to have a linear
00:22:35.520 | classifier already trained. So I'm just emulating that here. So I'm getting the
00:22:40.080 | query vector reshaping that and I'm inserting it as the first set of model weights w. And
00:22:49.520 | what we're going to do is we're going to initialize the loss. We're going to use bc with logics loss.
00:22:54.320 | And we're going to use stochastic gradient descent. Now this learning rate you'll probably find that's
00:22:59.680 | quite high. And it is high. We're just kind of putting it high so that we can see a lot of like
00:23:06.160 | quick movement through the data set. If you're actually implementing something like this you
00:23:09.920 | might want to use a lower learning rate. So with that we just create this function fit here which
00:23:19.120 | is basically just a training loop. And we can set the number of iterations per training loop. Again
00:23:24.320 | you might want to lower this if you don't want to move so quickly through the vector space and keep
00:23:28.320 | things a bit more stable. And yeah we'll just call fit. From that the model weight will actually
00:23:37.440 | be optimized and it will change. And that will represent the next query that we're going to pass
00:23:44.320 | into our vector database. So we convert into a flat list so that Pinecone can we can query
00:23:50.560 | in Pinecone with it. And so that we're not returning the same records that we just went
00:23:57.280 | through. We update the metadata attached to each one of the vectors that we've just seen
00:24:02.800 | to be set to equal scene equals one. And then the reason we do that is because we add a filter now
00:24:13.200 | to the next query where we set scene equal to zero. Okay and then we return the next set of
00:24:19.120 | queries and we can see here we have some other images. And basically what I'm doing here is
00:24:23.680 | trying to optimize for dogs and fields. And then from dogs and fields we're going to try and move
00:24:29.440 | to dogs at dog shows. Okay and we'll just go through this bit quickly now. So this is just
00:24:35.120 | tuning. So I'm putting all the what we just did into a single function just to make things a bit
00:24:39.760 | simpler. And yeah we'll go through. Okay so you can see how things are kind of changing more towards
00:24:46.160 | dogs and fields here. And then here it goes a bit crazy because basically I'm putting a lot of
00:24:53.040 | dogs as negative. So now it's thinking or maybe I don't actually want to see any dogs.
00:24:57.120 | And that makes it push away from that. But obviously I don't want that to happen. So I just
00:25:04.480 | set everything negative here other than I think this image that has a field or maybe this image
00:25:10.000 | that has a field and also this image of a dog. And then we go towards dogs again. Focus on that.
00:25:17.360 | Push towards dogs. And then here you can see the first in the middle right here. There's the first
00:25:23.360 | image of dogs at a dog show. Actually I think this is also a dog show here. So that would technically
00:25:28.240 | be the first one. But this is what I'm looking for. More like this sort of image. So we focus
00:25:34.160 | on that and we push for that a little more. Next one we see oh okay we have a few more dog shows
00:25:39.600 | here. So here and here. And we keep pushing for that. And you can see as we go through each step
00:25:45.920 | there's more of these dogs in dog shows. Because that's what I'm labeling as being more relevant.
00:25:51.280 | Okay and now we're really getting into that sort of space. Keep going and now we're at the point
00:25:59.520 | where pretty much everything when we're returning is a dog show. So this is the final bit. So now
00:26:06.560 | that we've done that we want to set all of the scene labels in our vector database back to not
00:26:13.760 | scene. Okay because we want to search again. We can either search without the filter just to check
00:26:19.600 | that it has trained the classifier. Or we just reset all of those scene labels. If you wanted
00:26:27.280 | to go through data again and focus more on those that's where you might want to reset all the
00:26:32.720 | labels back to zero. So to do that all I'm going to do is go through a while loop. And we keep
00:26:39.440 | going through and we search for everything where the filter is equal to scene. We get those ideas
00:26:44.880 | and then we mark them as not seen. Once we don't return any more items that means we've
00:26:50.480 | set everything to not seen because we're not returning anything else. We've seen equal to
00:26:56.240 | true. So at that point we break. So after that if we search again we get a completely unfiltered
00:27:05.120 | view of the search results. And here we go. Okay so we can see loads of dogs at dog shows.
00:27:13.680 | Now there's one here that isn't a dog at a dog show. I think the rest of them are.
00:27:18.960 | So with that we've actually fine-tuned our classifier. So now that we've finished
00:27:25.520 | optimizing those model weights we can save them to file. Okay so we do this. And with that let's
00:27:31.920 | have a look at how the model performs on actually classifying images. So again move to another
00:27:39.600 | notebook. This is number 02 classifier tests. And here we're just going to test the classifier
00:27:46.960 | on a set of images that it has not seen before. So again we initialize everything. Again if you've
00:27:55.040 | already loaded everything and you're in the same notebook you don't need to do this.
00:28:00.240 | So we need to load the validation split from ImageNet. So you can see here this before was
00:28:08.480 | train. Now it's validation. So you will need to rerun this bit. And we have about 4,000 images
00:28:15.040 | there. Now let's start by checking the predictions for some specific images. Okay so this one is a
00:28:23.120 | dog at a dog show. So we pre-process that. We get the image features from clip. And then we make a
00:28:32.240 | prediction. So the classifier and then we put in those the vector output by clip. And we can see
00:28:39.040 | there's a pretty positive value there. So positive remember is a true value. Negative is a not true
00:28:48.880 | prediction. Okay cool. So that's correct. It's predicted that that is a dog show.
00:28:55.840 | Now let's have a look at this. Okay this is not a dog show. So we should see that it will predict
00:29:00.640 | a negative value. So let's go through and yeah we get a pretty negative value there.
00:29:05.840 | So we can label the full data set and we'll find a cutoff point between what is viewed as
00:29:11.760 | relevant and what is irrelevant. So basically anything that's positive. So we do that here.
00:29:17.520 | I'm not going to go through it but it's essentially the same thing as what we just said. I'm just
00:29:21.520 | making a list of these predictions. Okay I'm going to add a column to the ImageNet data set
00:29:28.640 | called predictions. So we now have these three. And let's have a look. So filter out any results
00:29:35.440 | where the prediction is not positive. So we get 23 results. And let's have a look at what those are.
00:29:42.560 | So those 23 positive results. All of them, I think almost all of them, are dog shows.
00:29:52.560 | And we keep going through. So each one of these as we go through has been scored
00:29:58.320 | less highly but all these are still scored very highly. Okay I'm going through.
00:30:08.080 | And then we go through and then yeah we get this, I don't know, emoji chainsaw thing which is right
00:30:15.680 | at the bottom of these positively labeled things. It's kind of random. I don't know why it's in
00:30:20.960 | there. Yeah so other than literally these two images right at the end, everything else is a
00:30:28.000 | true positive. So it's predicted everything correctly other than these two. This one,
00:30:33.600 | no idea why. This one, I kind of understand, you know, dogs in a field. So generally speaking these
00:30:40.320 | are I think very good results. And we got these from fine-tuning our classifier on
00:30:46.960 | not really that many images. I think there was maybe 50 images there. So really good results
00:30:53.920 | on a very small amount of data. And that's because we're using vector search to focus our annotation
00:31:00.720 | and training on what is the most important part of the data set. Now doing this for an image
00:31:07.360 | classifier is just one example. We can do this with text. We can do this in recommendation engines
00:31:13.600 | or even anomaly detection. There's like a huge number of use cases with this. Basically whenever
00:31:19.920 | you need to classify something and you want to do that efficiently, you can use this as long
00:31:26.640 | as you're using something like a linear classifier. So for me, I think that was a really cool method
00:31:35.920 | for efficiently training classification models. Thank you a lot to Edo for actually sharing with
00:31:42.480 | this and explaining and walking me through everything. I think, yeah, this is a really
00:31:48.080 | useful method and I hope you will find it useful as well. So thank you very much for watching and
00:31:54.960 | I will see you again in the next one. Bye.
00:32:01.200 | [MUSIC]