back to indexGUI-based Few Shot Classification Model Trainer | Demo
Chapters
0:0 Intro
1:14 Classification
2:49 Better Classifier Training
6:33 Classification as Vector Search
8:47 How Fine-tuning Works
10:50 Identifying Important Samples
12:39 CODE IMPLEMENTATION
13:13 Indexing
18:59 Fine-tuning the Classifier
27:37 Classifier Predictions
30:43 Closing Notes
00:00:00.000 |
Today we're going to talk about a more effective way of training classification models. Nowadays 00:00:06.400 |
pre-trained models dominate the field of machine learning. There are very few ML projects that 00:00:13.040 |
start with us actually training a model from scratch. Instead we usually start by looking 00:00:19.920 |
for an off-the-shelf pre-trained model. Whether that pre-trained model is from an online platform 00:00:26.960 |
like PyTorch Hub or HuggingFace Hub or from our own internal already trained in-house models. 00:00:34.960 |
The ecosystem of these pre-trained models whether external or internal has allowed us to push the 00:00:42.480 |
limits of what is possible in machine learning. This doesn't mean however that everything is 00:00:48.320 |
super easy and everything works all the time. There are always going to be some challenges. 00:00:54.640 |
Fortunately we're able to tackle a lot of these problems that are actually shared across a huge 00:01:00.960 |
number of pre-trained models because they tend to have similar points of failure. One of those 00:01:07.120 |
is the excessive compute and data needed to actually fine-tune one of these models. Now 00:01:14.720 |
focusing on classification a very typical scenario that we have is that we have some model some big 00:01:22.640 |
model like BERT or T5 and what we want to do is fine-tune this model for classification. Now one 00:01:30.400 |
way we can do that is we add a simple linear layer onto the end of it and then we fine-tune that 00:01:37.040 |
linear layer. Now what I want us to focus on here is the model that comes before doesn't really 00:01:43.200 |
matter. We only really care about this linear layer. We can actually fine-tune that for a lot 00:01:48.400 |
of different use cases without even touching the model weights of the big pre-trained model that 00:01:54.160 |
comes before it. It's a classification layer that is actually producing the final prediction and 00:02:00.480 |
because of this that classification layer can become the single point of failure in producing 00:02:07.040 |
our predictions. So we focus on fine-tuning that classification layer and a common approach to 00:02:12.800 |
doing this might look a little bit like this. First we have to collect a data set that focuses 00:02:19.120 |
on enabling this model to adapt to a new domain or just dealing with data drift. Then we have to 00:02:27.280 |
slog through the data set and if it's going to work well it's usually a large data set labeling 00:02:33.440 |
the records as per their classification and then once all records have been labeled we have to 00:02:40.000 |
fine-tune the classifier. This approach works but it really is not efficient. There's actually a much 00:02:47.760 |
better way of doing this. What we need to do is focus our fine-tuning efforts on the essential 00:02:54.720 |
records that actually matter. Otherwise we're wasting time, our own time, and compute on 00:03:01.680 |
annotating and fine-tuning across the entire data set when the vast majority of the data in the 00:03:08.160 |
data set probably doesn't matter. So now the question is how do we decide which samples are 00:03:15.440 |
actually essential and which are not? Well that's where we can use vector search. We can use vector 00:03:21.520 |
search to search through our data set before we even annotate everything and identify the records 00:03:29.680 |
that are going to make the biggest impact on our model performance. Meaning we save our time and a 00:03:35.600 |
lot of compute by just skipping the non-essential records. Some of you may be thinking what does 00:03:41.440 |
vector search have to do with training a classification model? Well it's actually 00:03:47.440 |
super important. Many state-of-the-art models are available as pre-trained models. Those are models 00:03:55.280 |
like BERT, T5, EpsilonNet, OpenAI's CLIP. These models use an insane number of parameters and 00:04:06.400 |
perform a lot of complex operations. Yet when applied to classification we're actually relying 00:04:14.880 |
on the final layers that are added onto the end of these huge models. So we might have 00:04:22.640 |
some simple feed forward layers or just a linear classification layer. Now the reason for this is 00:04:29.440 |
that these models they're not being trained to produce class predictions. We can think of them 00:04:38.000 |
as actually being trained to make vector embeddings. So we pre-train these big models 00:04:44.400 |
and the idea is that after pre-training these models will produce these very information rich 00:04:53.440 |
vector embeddings. And then what we do for different tasks is that we add an extra task 00:05:01.760 |
specific head onto the end of that. And that task specific head is taking that vector embedding or 00:05:08.160 |
vector embeddings from the model and running them through a smaller network. Like I said it can just 00:05:15.920 |
be a linear layer and outputting something else. Outputting those predictions. So the power of 00:05:22.560 |
these models is not that they can do classification, question answering, all these different 00:05:29.840 |
things. The power of these models is that they produce these very information rich vectors that 00:05:35.600 |
then smaller simpler models can use to do these tasks of question answering, classification and so 00:05:42.320 |
on. These vectors that these models are producing are simply full of useful information that have 00:05:50.160 |
been encoded into a vector space. Okay so you can imagine in this vector space, imagine a 2D space, 00:05:57.280 |
we have vector A here, vector B here. Those two are very close to each other 00:06:04.240 |
and therefore they share some sort of similar meaning. Whereas vector C over here is very far 00:06:10.640 |
away from from A and B. Therefore it shares less meaning with A and B. Now the result of this is 00:06:18.640 |
that these models are essentially creating a map of information. Using this map they're able to 00:06:24.560 |
consume data like images or tech and output these useful information rich representations with 00:06:32.080 |
vectors. So our task in classification now is not to consume data and try and abstract 00:06:42.880 |
different meaning from that and classify that abstraction of meaning. In reality the abstraction 00:06:50.080 |
of meaning is already handled by the big models. Instead our task with classification is to teach 00:06:57.920 |
a smaller model to identify the different regions within that map or the vector space. Now a typical 00:07:06.000 |
architecture that we will see for classification is a pre-trained model followed by a linear layer. 00:07:12.400 |
Now we can think of the internal weights of this classifier as actually being a vector within the 00:07:20.640 |
wider vector space. And Ido Liberty, the founder and CEO of Pinecone and past head of Amazon AI 00:07:29.760 |
Labs explained to me that we can actually use this fact and couple it with vector search in order to 00:07:37.360 |
massively optimize the learning process for our classifier. So what we need to do is really 00:07:45.600 |
imagine this problem as being within a vector space or a map. We have the internal model weights 00:07:51.600 |
w and we have all these vectors that as of yet are unannotated and we haven't fine-tuned on them yet. 00:07:58.560 |
We want to calculate the dot products between w and x. If they share a positive direction 00:08:04.560 |
they will have a positive value and they produce a negative score if the directions are opposite. 00:08:11.440 |
Now there is just one problem with dot product here. It considers both direction and magnitude 00:08:16.720 |
which means that if we have a vector x that has a larger magnitude than another vector x even if 00:08:24.800 |
that other vector is actually the same vector as our model weights or very similar it can actually 00:08:31.520 |
output a larger dot product score. So what we need to do is normalize all these vectors that 00:08:37.440 |
we're comparing. This simply removes the magnitude problem and makes it that we are comparing only 00:08:44.960 |
the direction of the vectors. Now when we fine-tune the linear classifier with these vectors 00:08:52.800 |
it's going to learn to align itself with vectors that we label as positives and move away from 00:09:00.800 |
vectors we label as negatives. Now this will work really well but there are still some improvements 00:09:08.160 |
that we could add in here. First imagine we return only irrelevant samples in a single training batch. 00:09:16.240 |
They will all be marked as negative one and the classifier knows to move away from these values 00:09:21.280 |
but it doesn't know in which direction. Okay and especially in a high dimensional space there are a 00:09:25.840 |
lot of directions that the classifier can move in. So this is problematic because it means that the 00:09:31.360 |
classifier is just going to be moving at random away from those negative vectors. Another problem 00:09:37.280 |
is that many labels be more or less relevant. So imagine we had the query dogs in the snow and then 00:09:45.280 |
we had two pieces of text a dog and a dog in the snow. Both of those are relevant depending on what 00:09:53.840 |
you're looking at but a dog in the snow is more relevant. These two pieces of text are not equally 00:10:01.600 |
relevant but at the moment all we can do is label one as negative one as positive or both as 00:10:09.520 |
positives and that's not really ideal because it doesn't really show the full picture of both of 00:10:16.720 |
these are relevant just one is more than the other. So what we need is almost like a gradient of 00:10:21.920 |
relevance. We need a continuous range from negative e.g. minus one to positive e.g. plus one. Even if 00:10:29.680 |
we just have a range from negative one to negative 0.8 there's still a direction that the model can 00:10:37.440 |
figure out from that range of values. So all of this together just allows our linear classifier 00:10:44.720 |
to learn where to place itself within the vector space produced by the model layers preceding it. 00:10:50.560 |
Now that describes a fine-tuning process but we can't do this across our entire data set. 00:10:57.520 |
If we have like a big data set which we probably do it would take too much time annotating everything 00:11:03.920 |
and it would be a waste of our time as well. To do this efficiently what we must do is capitalize 00:11:10.480 |
on the idea of identifying relevant versus irrelevant vectors within a proximity of the 00:11:17.040 |
model's learned weights w. So we focus our efforts on the specific area that is actually going to be 00:11:24.560 |
helpful. For an already trained classifier those are going to be the false positives and false 00:11:29.360 |
negatives predicted by the classifier. However we also usually don't have a list of false negatives 00:11:37.680 |
and false positives but we do know that the solvable errors will be present near the classifier's 00:11:44.560 |
decision boundary e.g. the line that separates the positive predictions from negative predictions. 00:11:50.640 |
So we use vector search in order to actually pull in the high proximity samples that are most 00:11:58.000 |
similar to the model weights w. We then label those vectors and use them for training our model. 00:12:05.200 |
The model optimizes those internal weights w. We extract them again and then we perform a vector 00:12:11.200 |
search with them again and we just keep repeating this process over and over again until the linear 00:12:16.960 |
classifier has been optimized and is producing the correct predictions that we need. So by focusing 00:12:24.000 |
annotation and training on these essential samples we avoid wasting time and compute on 00:12:30.560 |
those vectors that don't make as much of a difference. Okay so all of that is the general 00:12:36.640 |
idea behind this process. Now let's have a look at how we can put all that together and fine-tune a 00:12:43.200 |
classifier with vector search. Now we will see that there are two parts to the training process. 00:12:49.200 |
First we need to index our data so that is where we embed everything using the preceding model 00:12:55.680 |
layers e.g. BERT or CLIP or so on and then store those in a vector database and then step two is 00:13:01.920 |
that we actually fine-tune the classifier. So query with model weights w, return the most similar 00:13:08.240 |
records, annotate them and then use them to fine-tune the classifier. So let's go ahead and 00:13:14.800 |
start with indexing. Given a data set of images or other formats we first need to process everything 00:13:22.240 |
through the big model preceding our linear classifier to create the vector embeddings. 00:13:28.320 |
For our example we're going to use a model called CLIP that's capable of understanding both 00:13:32.960 |
text and images and it has been trained on text image pairs and has learned how to encode them 00:13:41.120 |
into as similar vector space as possible. So what we're going to need to start with before indexing 00:13:47.200 |
anything is initializing a data set that we can then encode with CLIP. So we're going to use this 00:13:54.160 |
data set from Hugging Face datasets hub. So we can pip install everything we're going to need for 00:14:00.000 |
this here. We're taking the train split and that contains 9.5 000 images. Some of those are radios 00:14:08.560 |
like you can see here, there's pictures of dogs, trucks and a few other things. And we can see 00:14:16.000 |
an array of one of those images right there. Now it's not so important for what we're doing here. 00:14:22.640 |
What we do want to do is actually initialize both the model and the pre-processing steps 00:14:30.160 |
before the data is being fed into the model. So we do that here. So initialize the model CLIP 00:14:35.600 |
using this model ID here. Okay so this is one version the CLIP model. And then the pre-processor 00:14:43.920 |
will just take images and process them so that CLIP can read them. Okay as all we're doing here 00:14:50.160 |
we're going to go through all of these steps. This is the pre-processing and from that we get 00:14:57.040 |
the image features. Those image features are a vector representation of the image. So in this 00:15:05.680 |
case we've done the Sony radio image and that gives us a 512 dimensional vector embedding. 00:15:15.840 |
The embeddings from CLIP are not normalized. Okay so we're going to be using dot product both 00:15:22.000 |
within the model and during our vector search. So we should really normalize these. So we do that 00:15:28.720 |
here and then we see that these values are all between the values of negative one to two plus 00:15:36.880 |
one. Now that's how we embed or create a vector embedding for a single item. But we're going to 00:15:43.680 |
want to do for loads of items and we're also going to want to index them and store them inside a 00:15:48.160 |
vector database. So we're going to use Pinecone for this. You may need to sign up for a free API 00:15:54.560 |
key if you haven't already. And what we do is initialize our connection to Pinecone here. 00:16:00.720 |
You just put your API key here. It's all free. And then we create an index. Now it's important 00:16:07.040 |
that we have a few things here. So the index name that doesn't actually matter. Okay you can put 00:16:11.600 |
whatever you want. But what you do need is the correct dimensionality. So that is the 512 that 00:16:18.800 |
you saw up here. That is what we put in here. We do need to make sure that we're using dot product 00:16:26.080 |
similarity. And we're going to also include this metadata config. So basically when once we see an 00:16:33.120 |
image and we label it we're going to tell Pinecone we don't want to return that image again. Okay so 00:16:38.560 |
that we can go through and not over optimize on like 10 images. And then we connect to the index 00:16:46.880 |
after we have created it there. Now to add that single feature embedding that we just created, 00:16:55.120 |
that image embedding we just created, we would do this. Okay so we have an ID and then we just 00:17:00.800 |
convert the embedding into a list format and we just upsert. So with that we have one embedding 00:17:09.680 |
within our vector index. But of course we want to have our full data set in there so we can search 00:17:16.000 |
for it and add data and so on. So to do that we're going to use this loop here. I'm not going to go 00:17:23.440 |
through because it's literally what we've just done. Okay the only thing I think I've added here 00:17:27.760 |
is this which is checking for grayscale versus RGB images. But the rest of this is exactly the same. 00:17:36.880 |
Okay we're just going we're doing it all at a larger scale and we're also adding in the metadata 00:17:42.240 |
here. Okay so that's seen. We're setting it to zero for all the images to start with and then 00:17:46.640 |
we'll set it to one once we've seen a set of images. Mark them as you know positive or negative 00:17:53.600 |
and train with them. Then we set that seen value to one so we don't return it again. 00:17:58.240 |
Okay so we have this this radio. Let's have a quick look at how we might query. 00:18:05.840 |
So we create our query vector xq here which is just we're doing the same thing again as what 00:18:13.840 |
we did before. Normalizing it and then we query with it. Okay and that returns these items here 00:18:20.880 |
from Pinecone. Let's have a look at what they look like. So the first one is obviously that radio. 00:18:25.600 |
That radio is the most similar of the vector. So naturally that would be the first thing that 00:18:30.000 |
gets returned. Okay next one we have a car radio. We have another Sony radio. I think it's even the 00:18:36.640 |
same model. And another Sony radio which is also the same model. It seems so. And then just another 00:18:45.600 |
radio. It's very similar. So clearly those embeddings are pretty good from Clip. But now 00:18:51.360 |
what we want to do is fine-tune a linear classifier on top of that to classify these different images. 00:18:58.960 |
Okay so to do that I'm going to start from scratch. So this is a new notebook. You can find all the 00:19:04.400 |
links to these notebooks by the way in the video description or if you're watching this on the 00:19:09.360 |
article down at the bottom of the article in the resources section. So here initialize the connection 00:19:15.600 |
to the index again. You don't need to do this if you just ran through the last bit of code. You can 00:19:19.600 |
just keep that as it is and maintain your connection to the index. Again we're going to 00:19:26.240 |
load the data set and again you don't need to do that if you've already done it. Initialize the 00:19:33.760 |
model. So Clip and the processor. So there's one thing different here and you can actually 00:19:40.480 |
tokenize using the other preprocessor. But for the sake of covering everything I'm just showing you 00:19:46.080 |
how to do with the Clip tokenizer fast here as well. So here we're initializing just the tokenizer 00:19:52.160 |
side of the Clip preprocessor. And we're setting up this prompt. So dogs in the snow. We tokenize 00:19:58.960 |
them to get a set of token IDs and then we use the model get text features method in order to 00:20:06.720 |
get a vector embedding of that text, of that dogs in the snow prompt. Okay and we come down here. We 00:20:15.280 |
create the query vector from that and we're just going to retrieve top 10 most similar records and 00:20:22.400 |
store them in in XC. So it's just like the contents. So there's a few things in XC here. 00:20:29.360 |
We actually don't need all of this. So what we want is the IDs and then the values as well. 00:20:37.920 |
So first we get the IDs then we get the values. Okay and we can see why it's returned. So dogs 00:20:45.840 |
in the snow. Right this one is not a dog in the snow but you can kind of see where it's a bit 00:20:49.360 |
confused. The sand in the background does look kind of white and snowy. But then the rest of 00:20:55.760 |
these yeah they're dogs in the snow other than this one. So it's returning the right thing here 00:21:03.440 |
but let's say we don't want dogs in the snow. Okay let's say we want to adjust this to something 00:21:09.680 |
slightly different. Like for example dogs at dog shows and we'll go through this. So this code here 00:21:17.040 |
not really that important. All this is is a little interface that I built within Jupiter so that we 00:21:23.760 |
can sort of quickly go through and label the images. So I would run this. Okay I'm not going 00:21:33.040 |
to run it again. So I'll just run this here and basically what it's going to do is it's going to 00:21:38.240 |
show an image. So example this one here it's going to show the image and say okay what you rate this 00:21:43.360 |
from negative one to one. And you just go through you say you know what you what you would rate it. 00:21:49.280 |
And then that will give you or that will basically produce a dictionary that maps these ideas to the 00:21:56.240 |
score that you gave it. So you can see all the scores I gave last time I ran this. And you can 00:22:02.640 |
just double check that the ideas and scores are aligned here. Yes they are so you don't need to 00:22:07.840 |
worry so much about that. And all we do is we need to get the values which are going to be the inputs 00:22:14.320 |
of training data for the linear classifier. And then we get the labels okay so the scores. 00:22:18.880 |
So we go through and what we're going to do here is just initialize a PyTorch linear classifier 00:22:27.200 |
layer. And what I do first is so in most cases I imagine that we're going to have a linear 00:22:35.520 |
classifier already trained. So I'm just emulating that here. So I'm getting the 00:22:40.080 |
query vector reshaping that and I'm inserting it as the first set of model weights w. And 00:22:49.520 |
what we're going to do is we're going to initialize the loss. We're going to use bc with logics loss. 00:22:54.320 |
And we're going to use stochastic gradient descent. Now this learning rate you'll probably find that's 00:22:59.680 |
quite high. And it is high. We're just kind of putting it high so that we can see a lot of like 00:23:06.160 |
quick movement through the data set. If you're actually implementing something like this you 00:23:09.920 |
might want to use a lower learning rate. So with that we just create this function fit here which 00:23:19.120 |
is basically just a training loop. And we can set the number of iterations per training loop. Again 00:23:24.320 |
you might want to lower this if you don't want to move so quickly through the vector space and keep 00:23:28.320 |
things a bit more stable. And yeah we'll just call fit. From that the model weight will actually 00:23:37.440 |
be optimized and it will change. And that will represent the next query that we're going to pass 00:23:44.320 |
into our vector database. So we convert into a flat list so that Pinecone can we can query 00:23:50.560 |
in Pinecone with it. And so that we're not returning the same records that we just went 00:23:57.280 |
through. We update the metadata attached to each one of the vectors that we've just seen 00:24:02.800 |
to be set to equal scene equals one. And then the reason we do that is because we add a filter now 00:24:13.200 |
to the next query where we set scene equal to zero. Okay and then we return the next set of 00:24:19.120 |
queries and we can see here we have some other images. And basically what I'm doing here is 00:24:23.680 |
trying to optimize for dogs and fields. And then from dogs and fields we're going to try and move 00:24:29.440 |
to dogs at dog shows. Okay and we'll just go through this bit quickly now. So this is just 00:24:35.120 |
tuning. So I'm putting all the what we just did into a single function just to make things a bit 00:24:39.760 |
simpler. And yeah we'll go through. Okay so you can see how things are kind of changing more towards 00:24:46.160 |
dogs and fields here. And then here it goes a bit crazy because basically I'm putting a lot of 00:24:53.040 |
dogs as negative. So now it's thinking or maybe I don't actually want to see any dogs. 00:24:57.120 |
And that makes it push away from that. But obviously I don't want that to happen. So I just 00:25:04.480 |
set everything negative here other than I think this image that has a field or maybe this image 00:25:10.000 |
that has a field and also this image of a dog. And then we go towards dogs again. Focus on that. 00:25:17.360 |
Push towards dogs. And then here you can see the first in the middle right here. There's the first 00:25:23.360 |
image of dogs at a dog show. Actually I think this is also a dog show here. So that would technically 00:25:28.240 |
be the first one. But this is what I'm looking for. More like this sort of image. So we focus 00:25:34.160 |
on that and we push for that a little more. Next one we see oh okay we have a few more dog shows 00:25:39.600 |
here. So here and here. And we keep pushing for that. And you can see as we go through each step 00:25:45.920 |
there's more of these dogs in dog shows. Because that's what I'm labeling as being more relevant. 00:25:51.280 |
Okay and now we're really getting into that sort of space. Keep going and now we're at the point 00:25:59.520 |
where pretty much everything when we're returning is a dog show. So this is the final bit. So now 00:26:06.560 |
that we've done that we want to set all of the scene labels in our vector database back to not 00:26:13.760 |
scene. Okay because we want to search again. We can either search without the filter just to check 00:26:19.600 |
that it has trained the classifier. Or we just reset all of those scene labels. If you wanted 00:26:27.280 |
to go through data again and focus more on those that's where you might want to reset all the 00:26:32.720 |
labels back to zero. So to do that all I'm going to do is go through a while loop. And we keep 00:26:39.440 |
going through and we search for everything where the filter is equal to scene. We get those ideas 00:26:44.880 |
and then we mark them as not seen. Once we don't return any more items that means we've 00:26:50.480 |
set everything to not seen because we're not returning anything else. We've seen equal to 00:26:56.240 |
true. So at that point we break. So after that if we search again we get a completely unfiltered 00:27:05.120 |
view of the search results. And here we go. Okay so we can see loads of dogs at dog shows. 00:27:13.680 |
Now there's one here that isn't a dog at a dog show. I think the rest of them are. 00:27:18.960 |
So with that we've actually fine-tuned our classifier. So now that we've finished 00:27:25.520 |
optimizing those model weights we can save them to file. Okay so we do this. And with that let's 00:27:31.920 |
have a look at how the model performs on actually classifying images. So again move to another 00:27:39.600 |
notebook. This is number 02 classifier tests. And here we're just going to test the classifier 00:27:46.960 |
on a set of images that it has not seen before. So again we initialize everything. Again if you've 00:27:55.040 |
already loaded everything and you're in the same notebook you don't need to do this. 00:28:00.240 |
So we need to load the validation split from ImageNet. So you can see here this before was 00:28:08.480 |
train. Now it's validation. So you will need to rerun this bit. And we have about 4,000 images 00:28:15.040 |
there. Now let's start by checking the predictions for some specific images. Okay so this one is a 00:28:23.120 |
dog at a dog show. So we pre-process that. We get the image features from clip. And then we make a 00:28:32.240 |
prediction. So the classifier and then we put in those the vector output by clip. And we can see 00:28:39.040 |
there's a pretty positive value there. So positive remember is a true value. Negative is a not true 00:28:48.880 |
prediction. Okay cool. So that's correct. It's predicted that that is a dog show. 00:28:55.840 |
Now let's have a look at this. Okay this is not a dog show. So we should see that it will predict 00:29:00.640 |
a negative value. So let's go through and yeah we get a pretty negative value there. 00:29:05.840 |
So we can label the full data set and we'll find a cutoff point between what is viewed as 00:29:11.760 |
relevant and what is irrelevant. So basically anything that's positive. So we do that here. 00:29:17.520 |
I'm not going to go through it but it's essentially the same thing as what we just said. I'm just 00:29:21.520 |
making a list of these predictions. Okay I'm going to add a column to the ImageNet data set 00:29:28.640 |
called predictions. So we now have these three. And let's have a look. So filter out any results 00:29:35.440 |
where the prediction is not positive. So we get 23 results. And let's have a look at what those are. 00:29:42.560 |
So those 23 positive results. All of them, I think almost all of them, are dog shows. 00:29:52.560 |
And we keep going through. So each one of these as we go through has been scored 00:29:58.320 |
less highly but all these are still scored very highly. Okay I'm going through. 00:30:08.080 |
And then we go through and then yeah we get this, I don't know, emoji chainsaw thing which is right 00:30:15.680 |
at the bottom of these positively labeled things. It's kind of random. I don't know why it's in 00:30:20.960 |
there. Yeah so other than literally these two images right at the end, everything else is a 00:30:28.000 |
true positive. So it's predicted everything correctly other than these two. This one, 00:30:33.600 |
no idea why. This one, I kind of understand, you know, dogs in a field. So generally speaking these 00:30:40.320 |
are I think very good results. And we got these from fine-tuning our classifier on 00:30:46.960 |
not really that many images. I think there was maybe 50 images there. So really good results 00:30:53.920 |
on a very small amount of data. And that's because we're using vector search to focus our annotation 00:31:00.720 |
and training on what is the most important part of the data set. Now doing this for an image 00:31:07.360 |
classifier is just one example. We can do this with text. We can do this in recommendation engines 00:31:13.600 |
or even anomaly detection. There's like a huge number of use cases with this. Basically whenever 00:31:19.920 |
you need to classify something and you want to do that efficiently, you can use this as long 00:31:26.640 |
as you're using something like a linear classifier. So for me, I think that was a really cool method 00:31:35.920 |
for efficiently training classification models. Thank you a lot to Edo for actually sharing with 00:31:42.480 |
this and explaining and walking me through everything. I think, yeah, this is a really 00:31:48.080 |
useful method and I hope you will find it useful as well. So thank you very much for watching and