Back to Index

Making The Most of Data: Augmented SBERT


Chapters

0:0
7:1 Language
7:28 Data Augmentation Techniques
9:28 Contextual Word Embeddings
12:14 Cross Encoder
15:16 Data Augmentation
28:23 Create that Unlabeled Data Set
34:52 Remove any Duplicates
35:53 Predicting the Labels of Our Cross Encoder
45:11 Pooling Layer

Transcript

In this video, we're going to have a look at how we can make the most of the limited data using language data augmentation strategies and training approaches More specifically we're going to focus on something called augmented expert. So You may or may not be aware that the past decade has has been sort of a renaissance or explosion in the field of machine learning and data science and a lot of that especially early progress with things like a Ceptron and current neural networks a lot of that was researched and discovered back in the 50s and 60s and 70s, but we didn't see that really applied in industry or anywhere really until the past decade and There are two main reasons for this.

So the first is that we didn't have enough compute power back in the 50s 60s 70s to train the models that we needed to train and we also didn't have the data to actually train those models now compute power is Not really a problem anymore. We sort of look at this graph It depends on what model you're training, of course, if you are opening on your training GPT 4 or 5, whatever Yeah, maybe compute powers is pretty relevant but for most of us we can get access to Cloud machinery not personal machines and we can wait a few hours or a couple of days and fine-tune or pre-train a transform model that is Good performance or for what we need Now that obviously wasn't always the case until very recently back in 1960s you see on this graph here.

We have the IBM 704 and I mean you can see on the the y-axis. We have floating-point operations per second And that's a logarithmic scale. So linear scale just basically looks like a straight line until a few years ago and shoots up it's pretty impressive how much progress is made in terms of computing power now Like I said, that's you know, not really an issue for us anymore we you know, we have the compute in most cases to do what we need to do and Data is not as much of a problem anymore, but we'll talk about that in a moment so data again, we have a very Big increase in data not quite as big as the computing power and this this graph here doesn't go quite as far back It's only 2010 Where I believe it was at 2 zettabytes and now 2021 or or so in in 2021, so there's a Fairly big increase not quite as much as compute power over time, but still pretty massive now the thing with data is yes, there's a lot of data out there, but is there that much data out there for What we need to train models to do and in a lot of cases Yes, there is but it really depends on what you're doing if you are Focusing on The more niche domains.

So what I have here on the left over here are a couple of niche domains There's not that much data out there on sentence pairs for climate evidence and claims for example, so where you have a Piece of evidence and a claim and whether the the claim supports evidence or not There is a very small data set called climate fever data set, but it's not it's not big For agriculture, I assume within that industry.

There's not that much data, although I Have never worked in that industry. So I Not fully where I just assume this probably not that much and then also niche finance Which I do at least have a bit more experience with and I imagine this is probably something though A lot of you will find useful as well Because a finance is a big industry.

There's a lot of finance data out there, but there's a lot of niche Little projects and problems in finance where you find much less data So, yes, we have a lot more data nowadays, but we don't have Enough for a lot of use cases on the right here. We have a couple of examples of low resource data sets so we have a debate from the Maldives and also the Languages as well.

So with these We kind things find a different approach now we can investigate depending on your use case on supervised learning TSD a which we have covered in a previous video article and That does work when you're trying to build a model that recognizes Generic similarity and it works very well as well.

But for example with the climate claims data We are not necessarily trying to match sentence a and B based on their semantic similarity but we're trying to match it sentence a which is a claim to Sentence B, which is evidence as to whether that evidence supports the claim or not so in that case Unsupervised approach like TSC doesn't really work so what we have is very little data and We there aren't really any Alternative training approaches that we can use.

So basically what we need to do is Create more data now data augmentation is Difficult particularly for language so data augmentation is not specific to NLP it's used across ML and it's more established in the field of computer vision and That makes sense because computer vision you say you have an image you can modify that image using a few different approaches and The person can still look at that image and think okay, that is the same image It's just maybe it's rotated a little bit.

We've changed a color grading brightness Or something along those lines just modified it slightly, but it's still in essence the same image now for language, it's a bit difficult because language is very abstract and Nuance, so if you start randomly changing certain words the chances are you're going to produce something that doesn't make any sense and We when we're augmenting our data, we don't want to just throw rubbish into our model.

We want Something that makes sense. So there are some data augmentation techniques and We'll have a look at a couple of the simpler ones now, so There is a library called NLP or which I think is is very good for this sort of thing. It's essentially a library that allows us to do data augmentation for NLP and What you can see here is two methods using word2vec Vectors and similarity and what we're doing is taking this original Sentence so the quick brown fox jumps over the lazy dog And we're just inserting some words Using word2vec.

So we're trying to find what words were to bet things could go in here which words are the most similar to the surrounding words and We have this Alessiari, which I don't know. I think it seems like a name to me that I'm not sure That I don't think really fits them so it's not great it's not perfect Lazy superintendents dog that does kind of make sense.

I feel like a lazy superintendents dog is Maybe a stereotype or I'm sure it's been in The Simpsons or something before So, okay fair enough. I can I can see how that can in there, but again, it's a it's a bit weird. It's not great Substitution for me seems to work better So rather than the quick brown fox, we have the easy brown fox and rather than jumping over the lazy dog Jumps around the lazy dog which changes the meaning slightly Easy is a bit weird there to be fair but We still have a sentence that kind of makes sense So that's good.

I think now we don't have to use word to bed you can also use contextual word embeddings like with Bert and For me, I think these results look better. So for insertion We get even the quick brown fox usually jumps over the lazy dog. So we're adding some words that make sense that's I think good or substitution and we are ending one word here and We're changing that to a little quick brown fox instead of just quick brown fox So I think that makes sense.

And this is a good way of renting your data more data from less but for us because we are using Sentence pairs we can basically just take all of the data from from say we have a B over here. I should miss the data frame and we have all of these Sentence A's and we have all these sentence B's now if we take One sentence a it's already matched up to one sentence B and what we can do is say okay, I want to randomly sample some other sentence fees and match them up to Sentence a so, you know, we have Three more pairs now.

Okay, so if we if we did this if we took three sentence A's three sentence B's and We made new pets from all of them. So it's not really random sampling just taking all the possible pairs that we end up with Nine New or nine pairs in total which is much better if you send that a look further so from just thousand pairs We can end up with one million pairs so you can see quite quickly you can you can take a small data set and very quickly create a Big data set with it.

Now. This is just one part of the the problem though because our Smaller data set will have similarity scores or natural language inference labels But the the new data set that we've just created the augmented data set doesn't have any of those We just ran with sample new sentence.

So there's no scores or labels that we need those to actually train a model so What we can do is take a slightly different approach or add another step here Now that other step is using something called a cross encoder so in semantic similarity We can use two different types of models.

We can use a cross encoder over here Or we can use a by encoder or it or what I would usually call a sentence transformer now a cross encoder is the old way of doing it and it works by Simply putting sentence a and sentence B into a BERT model together at once so we have sentence a separate a token B feed that into a BERT model and From that BERT model.

We will get all of our Embeddings output embeddings over here and they all get fed into a linear layer Which converts all of those into a similarity score up here? now that similarity score is typically going to be more accurate than a Similarity score that you get from a by encoder or such transformer but The problem here is from our sentence transformer, we are outputting sentence vectors and If we have two sentence vectors, we can perform a cosine similarity or Euclidean distance calculation to get the similarity of those two vectors and cosine similarity Calculation or operation is much quicker than a full BERT inference set right which is what we need with a cross encoder.

So I Think it is something like for maybe 10 maybe clustering 10,000 Vectors using a cross encoder expert cross encoder would take you something like 65 hours Whereas with a by encoder, it's going to take you about five seconds. So it's much much quicker and That's why we use by encoders or sentence transformers now The reason I'm talking about cross encoders is because we get this more accurate similarity score which we can use as a label and Another very key thing here is that we need less data Train a cross encoder with a by encoder if we I think the expert model Itself was trained on something like 1 million Sentence pairs and some new models are training a billion or more Whereas a cross encoder we can we can train a reasonable cross encoder on like 5k or maybe even less Sentence pairs.

So we need much less data and That works quite well. We've been talking about the data orientation. We can take a small data set we can augment it to create more sentence pairs and Then what we do is train on the original data set, which we call the gold data set we train our cross encoder using that and Then we use that fine-tune cross encoder to label the augmented Data set without labels and that creates a augmented label data set that we call the silver data set So That sort of strategy of creating a silver data set which we would then use and to fine-tune our by encoder model is What we refer to as the in domain Augmented Expert Training strategy Okay, and this so what you can see this flow diagram is basically every set that we need to do to Create an in domain or spurt training process So we we've already described most of this so we get our gold data set the original data set That's gonna be quite small.

Let's say one to five Thousand sentence pairs are labeled from now. We're going to use something like random sampling Which I'll just call ran Sam We're going to use that to create a larger data set. Let's say we create something like a hundred thousand sentence pairs But these are not labeled.

We don't have any Similarity scores or natural language inference labels or these So What we do is we take that gold data set and we take it down here and we fine-tune a cross encoder Using that gold data because we need less data to train a reasonably good cross encoder So we take that and we fine-tune cross encoder and then we use that cross encoder Alongside our unlabeled data set to create a new silver data set Now the cross encoder is going to predict the similarity scores or in a line labels or every pair So with that we have our silver data we also have the gold data, which is up here and we actually take both those together and we fine-tune the by encoder or the sentence transformer on both the gold data and the silver data now one thing I would say here is it's useful to Separate some of your gold data at the very start.

So don't even train your cross encoder on those it's good to separate them as your is your evaluation or test set and What evaluate the both the cross encoder performance and also your by encoder performance on that separate So don't include that in your training data for any of your models Keep that separate and then you can use that to figure out is this working or is it not work?

so that is in domain or meant of experts and Sort of see this is the same as what he saw before just another this is a training approach that so we have the gold trained cross encoder We have our unlabeled pairs which would come from random sampling on gold data we process those for a cross encoder to create a silver data set and then the silver and the gold Come over here to fine-tune a by encoder So that's it for the theory and the concepts and Now what I want to do is actually go through the code and and work to an example of how we can actually do this okay, so we downloaded the both the training and the validation set by a CSP data and Let's have a look at what some of that data looks like.

So CSP zero So we have sentence pair sentence one sentence two Just a simple sentence and we have a label which is our similarity score now that similarity score varies from between 0 up to 5 where 0 is no similarity no relation between the two sentence pairs and 5 is they mean that that same thing now see here these two mean the same thing as we Now we can see here that these two mean the same thing as we would expect So we first want to modify that Score a little bit because we are going to be training using cosine similarity loss and we would expect our Label to not go up to a value of 5 but instead go to value 1.

So All I'm doing here is Changing that score so that we are dividing everything by by normalizing everything so we do that and no problem and Now what we can do is load our training data into a data loader. So to do that we first Form everything into a input example and then load that into into our pineclutch data loader So I run that and then at the same time during training I also want to Output a evaluation source.

So how did the cross encoded do on the evaluation data? so to do that I Import so here we're importing from chance sentence transformers cross encoder Evaluation I'm importing the cross encoder CE correlation evaluator I Again I'm using input examples with working sentence transformers library And I importing both text and the labels and Here I Putting all that development or I'm putting all that relational data into that evaluator, okay, and I can run that and Then we can move on to initializing a cross encoder and training it and also evaluating it.

So to do that, we're going to import from sentence transform it so from Sentence transform it and I'll make sure I'm working in Python I'm going to import from cross encoder a Cross encoder. Okay, and To initialize that cross encoder model, I'll call it see all I Need to do is write cross encoder very similar to when we write sentence transformer initialize it and model we specify the model from face transformers that we like to Initialize a cross encoder from so birthdays in case and also number of labels that we'd like to use so in this case, we are just targeting a Similarity score between zero and one.

So we just want a Single label that if we were doing for example, NLI labels where we have entailment contradiction and Neutral labels or some other labels and we would change this to for example three, but in this case one We can initialize our cross encoder and then from now we move on to actually training so we call model or see dot fit and We want to specify The data loader so it's slightly different to the fit function.

We usually use of sentence transformers So we want train data loader we specify our loader that we Initialize just up here the data loader we Don't need to do this. But if you are going to evaluate your Model during training you also want to add in evaluator as well So this is from the CE correlation evaluator to make sure here using a cross encoder evaluation class we would like to run for Say one epoch and we should define this because I would also like to While we're training I would also like to include some warm-up sets as well We should I'm going to include a lot of warm-up sets actually and although I'll mention it.

I'll talk about it in a moment. So I Would say number of epochs Is equal to one and for the warm-up I would like to take integer so the length of loader so the number of Actions that we have now and our data set. I'm going to multiply this by 0.4.

So I'm going to Do a warm-up or do warm-up sets for 40% of our total data set size or batch or 40% of our total number of batches and We also need to multiply that by number of epochs. Let's say for training two epochs We do multiply that in this case just one so not necessary, but it's there so we're actually forming warm-up for 40% of the Training steps and I found this works better than something like 10% 15% 20% However that being said I Think you could also achieve a similar result by just decreasing the learning rate of your model.

So By default, so if I write the epochs here, we'll define the warm-up sets With warm-up so by default this we use optimizer params with a learning rate of 2 e to the minus 5 okay, so if you Say want to decrease that a little bit you could go.

Let's say Go to the minus 6 5 e to the minus 6 and this would probably have a similar effect to having Such a significant number of warm-up sets and then in this case, you could decrease this to 1 or 10% But for me the way I've tested this I've end up going with 40% warm-up sets and that works quite well so the final step here is Where do we want to save our model?

So I'm going to say I want to save it into the base cross encoder Or let's say that STSB cross encoder and We can run that and that will Run everything for us. I'll just make sure it's actually yep. There we go So see it's running, but I'm not gonna run it because I've already done it so let me pause that and I will Move on to the next step okay, so we now have our gold data set which we have pulled from looking face data sets and We've just fine-tuned a cross encoder.

So Let's cross both of those off of here this and this and Now so before we actually go on to predicting labels with the with the cross encoder We need to actually create that unlabeled data set so let's do that through random sampling using the gold data set you already have and Then we can move on to the next steps okay, so I'll just Add a little bit separation in here.

So now we're going to go ahead and create the augmented data So as I said, we're going to be using random sampling for that and I find that the the easiest way to do that is to actually go ahead and use a pandas dataframe rather than using the Data set object that we currently have so I'm gonna go ahead and initialize that so we have our gold data that will be PD the dataframe and In here we're going to have sentence one sentence two system one That Is going to be equal to STSB Sentence one Okay, and as well as that we also have sentence two going to be STSB sentence two now we may also want to include our Label in there, although I wouldn't say this is really necessary Add it in So our label is just like And if I have look here so we have Gonna overwrite anything called gold.

It's okay So, okay, I'm gonna have a look at that as well so you can see a few examples of what we're actually working with I'll just go ahead and actually rerun these as well Okay, so there we have we have our gold data and now what we can do because we Reformatted that into a kind of data frame.

You can use the sample method to randomly sample different sentences so To do that what I will want to do is create a new data frame So this is going to be our one labeled silver date So it's not it's not a silver data set yet because we don't have the labels or scores yet But this is going to be where we we will put them and in here.

We again will have sentence one and also sentence Two but at the moment that they're empty. It's nothing nothing in there yet so what we need to do is actually iterate through all of the rows in here, so Before that I'm just going to do from or import tqdm.auto From the tqdm.auto import tqdm That's just a progress bar so you can see you know where we are I don't I don't really like to wait and have no idea how long this is taking to process and For sentence one in tqdm so we have the progress bar and I'll take a list of a set So we're taking all the unique values in the gold data frame for sentence one Okay, so that will just loop through every single unique sentence of one Item in there and I'm gonna use that and I'm going to randomly sample five sentences from the other column sentence two to be paired with that sentence one and yeah, I'll sample or the sentence to Phrases that we're going to sample are going to come from the gold data, of course, and we only want to sample from rows where sentence one is not equal to the current sentence one because otherwise we Are possibly going to introduce duplicates and we're going to remove duplicates anyway But let's just remove them from the sampling in the in the first place.

So we're going to Take that so all of the gold data set that where sentence one is Not equal to sentence one and what I'm going to do is just sample five of those rows Like that now from that. I'm just going to extract sentence to sort of five sentence two phrases that we have there and I'm going to convert them into a list and now for sentence two In the sampled list that we've just created.

I'm going to take my pairs I'm going to append new pairs. So pairs are penned and I want sentence one to be sentence one and Also sentence two is going to be equal to sentence two now this Will take a little while So what I'm going to do is actually Maybe not include the the full data set here So let me possibly just go maybe the first 500 Yeah, let's go to first 500 see how long that takes and I will also want to just have a look at what we what we get from that Okay, so yes, it's much quicker Okay, so we have sentence one.

Let me Remove that from there And Let's just say that top ten, right? so because we we're taking five of sentence one every time and random sampling it we can see that we have a few of those and Another thing that we might do is remove any do fits now That probably isn't any duplicates here, but we can check so pairs equals pairs up drop duplicates and Then we'll check the length of pairs again Okay also prints Let me run this again and print Okay So they were not any do it's anyway, but that's it it's a good idea to add that in just in case and Now I want to do is Actually take the cross encoder.

In fact, actually let's go back to our little flow charts. So we have Now created a larger unlabeled data set So it's good and now we go on to predicting the labels of our cross encoder so down here What I'm gonna do is take the cross encoder code here and what I've done is I've trained this already and I've uploaded it to the Plugin-based models.

So what we what you can do and what I can do is This so I'm gonna write James Callum and it is called Bert STSB cross encoder Okay Okay, so that's our cross encoder and Now what I want to do is Use that cross encoder to create our labels.

So that will create our silver data set now to do that. I I'm gonna call it silver for now. I mean it is this isn't really the silver data set, but it's fine and I'm going to create a list and I'm going to zip both of the columns from our pets So pairs sentence one pair sentence, too As One and pairs sentence - okay, so That will give us All about all about pairs again, you can look at those okay, so it's just like this and What we you want to do now is actually create our score.

So just take the cross encoder what did we load as C C dot predict and We just pass in that silver data so Do that Let's run it. It might take a moment Okay, so it's definitely taking a moment so Let me pause it. I'm going to just do Let's say 10 because I already have the full data set so I can I can show you that Somewhere else and let's have a look at what you have in those scores.

So three of them So we have an array and we we have these these scores. Okay, so that They are our predictions our similarity predictions for the first three now because they're randomly sampled a lot of these are negative So if we go silver Say negative, I mean more They're not relevant.

So Yeah, we can see Not particularly relevant. Let's just one must first issue with this and you can you can change you can try and modify that by after creating your scores if you have if you oversample and For a lot of values or a lot of records and then just go ahead and remove most of the low scoring samples and keep all of your high scoring samples that will help you deal with that imbalance in your data so What I'm going to do is I'm going to add to the labels column those scores which Will not actually cover all of them because we only have ten in here.

So Let me maybe multiply that So this isn't you shouldn't do this obviously it's just so they fit Okay, and Let's have look. Okay, so we now have sense one sense two and some labels and What you do, but I'm not going to run. This is you would write pairs to CSV And that's so you need to do this if you're running everything in the same notebook Well, it's probably a good idea so with CSV so I'm going to say the silver data is a tab separate file and Obviously the separator for for that type of file is it it's a tab character and I don't want to include in those Okay, and that will create the the silver data file that we can train with Which I do already have so We come over here we can we can see that I have this file and We have all of these different Sentence pairs and the scores that our encoder has assigned to them So I'm going to close that I'm going to go back to the demo and What I'm now going to do is actually First go back to the flow chart that we had.

I'm going to cross off predict labels And We're going to go ahead and fine-tune it the by encoder on both gold and silver data So we have the gold data Let's have a look at we have Yes, and the silver I'm going to load that from file. So PD read CSV Silver TSV Separator is tab Character and Let's have a look what we have make sure it's all loaded correctly looks good now I'm going to do is Put both those together.

So all data is equal to gold and silver and We ignore the index so they get an index error Sorry true and All data is got head Okay, we can see that we hopefully now have all of the all of the data in this check the length So it's definitely bigger a bigger data set now before which is gold okay, so we now have a larger data set and we can go ahead and use that to fine-tune the By encoder or sentence transformer.

So what I'm going to do is take the code from up here so we have this train and data and I think I've already run this for so I don't need to import the input example here But what I want to do here is for row in All data and What we actually want to do here is for I row in all data because this is a date frame It's a rose It's right through each row.

We have row sentence one sentence two and also a label so We load them and to our train data and we can have a look at that train data See what it looks like Okay, we see that we get all these Inputs a sample objects if you want to see what one those has inside you can access the text like this Should probably do that on a New cell.

So let me pull this down here and you can also access a label To see what we what we have in there Okay So that looks good and we can now take that like we did before and load it into a data loader So, let me go up again and we'll we'll copy that Where are you?

Take this Bring it down here and Run this crates our data loader and we can move on to actually initializing the sentence transformer or by encoder And actually training it. So once we run from sentence transformers, we're going to import Models and also going to import sentence transformer now to initialize our sentence transformer if you following along with the series of videos and articles you Will know that we we do something looks like this So we're going to convert and that is going to be models adopt transformer And here we're just loading a model from face transformers, so they're based on case and We also have our pooling layer.

So models again, and we have pooling and in here we want to include the dimensionality of the the vectors that the pooling layer should respect which is just going to be birds dot get word embedding dimension and Also, it needs to know what type of pooling we're going to use we're going to use CLS pooling are going to use mean pooling max pooling or so on now we are going to use Pooling and we're going to use a mean To mode mean tokens.

Let me say that's true so they're the two Let's say components in our in our sentence transformer and we need to now put those together So we're gonna call model equals sentence transformer and we write modules and Then we just pass as a list that and also cooling Okay So we run that we can also have one model looks like Okay, and we have you see we have a sentence transformer object and inside there.

We have two Layers or components first ones are transformer It's a BERT model and the second one is our pooling and we can see here the only pooling method that is set to true is the Mode mean tokens, which means we're going to take the mean across all the word Embeddings output by BERT and use that to create our sentence embedding or vector so with that model now defined we can Initialize our loss function.

So we do want to write from sentence transformers Losses import cosine similarity loss It's okay sign Similarity loss and in here we seem to pass the model. So understands which parameters to to actually optimize and Initialize that and then we sell our training function or the fit function and That's similar to before the cross encoder although slightly different.

So let me let me take that. It's a little further up From here Then take that and we're just gonna modify it so Warm up. I'm going to warm up for a 15% of the number of steps now. We're going to run through we change this to model it's not it's not see anymore and Like I said, there are some differences here.

So we have a training objectives. That's different and This is just a list of all the training objectives. We have we are only using one and we just pass loader and loss into that Evaluator we could use an evaluator. I'm not going to For this one, I'm going to evaluate everything afterwards the Epochs and warm steps are the same.

The only thing that's different is the output path, which is going to be but STS be That's that's it so go ahead and run that should should run let's check that it does Okay, so I've got this error here so it's lucky that we we checked and we have this runtime error found D type long but expected float and If we come up here, it's going to be in the data load or in the data that we've Initialized so I Yeah, so here I've put int first some region.

I'm not sure why did that? So this should be a float that the label in your your training data And that should be the same Up here as well Okay, so here as well the the cross encoder we would expect a float value So just be aware that I'll make sure there's a note in the video earlier on for that okay, and Okay, let's continue through that and try and rerun it should be okay now.

Oh I need to actually rerun everything else as well So rerun this Okay label 1.0 This is This for a moment just to be sure that is actually running this time, but it does look good, so Yeah So Looks good when for some reason in the notebook I'm actually seeing the number of iterations, but okay Yeah, pause it now and we can see that.

Yes. It did run through two iterations. So it is running Correctly now, that's good so That's great What I want to do now is actually show you okay valuation of these models. So Back to our flow chart quickly. Okay, so fine-tuned by coda. We've just done it So we've now finished without in the main or augmented expert training strategy and Yeah, let's move on to the evaluation.

Okay, so my evaluation script here Is Maybe not the easiest to to read But basically all we're doing is we're importing the embedding similarity evaluated from down here I'm loading the the glue data SDSP again, and we're taking validation split, which we didn't train on we are converted into input examples feeding it into our embedding similarity evaluator and Loading the model the model name a pass through command line arguments for up here and Then it just prints out of the score so Let me switch across to the command line.

I can we can see how that how that actually performs Okay, so just switch across to my other desktop because this is much faster so I can I can actually Run this quickly. So Python and zero three, so we're gonna run that valuation script and we're going to pass here is we have all three or all three the cross encoder the sentence transformer train using Augmented experts and also a sentence transformer train purely on the gold data set.

So first let's have a look at the But SDSP gold data set trained model, so Run this might take a moment to download it Okay, so everything downloaded and then we've got a score of zero point five zero six. So it correlates to the predictions of the model correlate to the actual scores with a sort of 50% Relation so they do correlate.

It's not bad. It's not great either. Let's have a look at the cross encoder so Again Encoder Okay, and we get score of 0.58. So as we'd expect training on just the gold data the cross encoder does outperform the the by encoder or sentence transformer and The final one would be okay with the augmented data How does the sentence transformer perform?

so let's run that again the way for to download and we get a much better score of zero point six nine, so Yeah, the I mean the correlation area is much higher than this for the augmented data So then if we had just used a gold data set, so it really really has improved the performance a lot now This is maybe an atypical performance increase.

It's like 90% or 90 point increase in Performance and that's good. But if you look at the the original paper from there was rhymers and co they Found a sort of expected performance increase of I believe seven or nine Points. So this is definitely pretty significant. This is definitely a bit more than that But I think it goes show how good these models or this training strategy can actually be so That's it for this video.

I hope this has been useful and I hope this helps a few of you kind of Overcome the sometimes lack of data that we find and I think a lot of our Particular use cases. Thank you very much for watching and I will see you in the next one