back to indexHow-to do Sentiment Analysis with Flair in Python
Chapters
0:0
0:42 Install Flare
2:0 Import a Sentiment Model
8:36 Labels Method
00:00:00.000 |
Hi, and welcome to this video on sentiment analysis using the Flare library. 00:00:06.660 |
So Flare is an incredibly simple, easy-to-use library, which contains a load of pre-built 00:00:12.880 |
models for NLP that we can simply import and use to make predictions. 00:00:19.480 |
So it actually allows us to use some of the most powerful models out there as well. 00:00:24.580 |
So in this tutorial, we're going to be using the Distilbert model, which is based on a 00:00:29.940 |
BERT, but it's a lot smaller, but almost as powerful as BERT itself. 00:00:39.580 |
First, if you haven't already, you need to pip install Flare. 00:00:46.700 |
And alongside Flare, you are also going to need PyTorch. 00:00:50.820 |
If you haven't got PyTorch installed already, you'll need to head over to the PyTorch website. 00:00:58.820 |
And they give you instructions on exactly what you need to install. 00:01:03.340 |
So we come down to here and we can see, okay, for me, I have Windows. 00:01:08.180 |
I want to install using Conda, using Python, and then CUDA. 00:01:14.100 |
So this is if you have a CUDA-enabled GPU on your machine. 00:01:19.000 |
If you don't know what that means, you probably don't. 00:01:29.860 |
So all we need to do is copy the command underneath here, and then we would run this in our Anaconda 00:01:41.140 |
I already have these installed, so I'm going to go ahead and actually begin coding. 00:01:47.500 |
So we're going to need to use Pandas and also Flare. 00:01:56.960 |
So now we have imported Flare, we can actually import a sentiment model straight away. 00:02:04.020 |
So all we need to do is we want to pass our sentiment model to a variable, which we will 00:02:14.780 |
And we just need to write Flare.models.textClassifier and load. 00:02:29.520 |
And then in here, we pass the model name that we would like to load. 00:02:34.960 |
And in our case, it will be the English sentiment model, which is en-sentiment. 00:02:53.220 |
And in a moment, that will have downloaded and we can begin using it. 00:03:01.120 |
I have downloaded some data here, which is a sentiment data set based on the, I think 00:03:19.000 |
Okay, so it's Sentiment Analysis on Movie Reviews data set, so it's from Rotten Tomatoes. 00:03:26.200 |
And you scroll down and we have the training data and test data here. 00:03:31.400 |
I'm just going to use the test data, but we can use either. 00:03:35.320 |
We're just going to be making predictions based on the phrase here. 00:03:47.400 |
So it's going to read it in as if it were a CSV file, and we will just pass a tab as 00:03:54.160 |
our separator because we are actually working with a tab-separated file. 00:04:21.000 |
Okay, so the first thing you'll notice is that we actually have duplicates of the same 00:04:38.000 |
So this first entry here is the full phrase, and then all of these following it are actually 00:04:47.000 |
So what we can do, so let's change it so we can actually see the full phrase first. 00:04:59.440 |
Okay, so we can't really see that much more anyway, but that's fine. 00:05:14.400 |
So to remove this, we just want to drop all of the duplicates whilst keeping the first 00:05:22.860 |
So you see each one of these, they all have the same sentence ID. 00:05:26.960 |
It's actually only the first one that we need. 00:05:29.960 |
So we just drop duplicates on this column, keeping the first entry. 00:05:53.520 |
Okay so we're keeping the first entry, dropping duplicates from sentence ID, and we're just 00:06:04.440 |
Okay so now we can see each sample is now a unique entry. 00:06:14.580 |
So we need to actually first convert our text into a tokenized list using Flare. 00:06:27.740 |
So if we, for example, pass Hello World into the Flare tokenizer, we will be able to see 00:06:49.420 |
Okay so here we can see that it's split each one of these into tokens. 00:06:55.340 |
So we've got Hello as a token, World as a token, and then we have also split the exclamation 00:07:04.460 |
And you can see that Flare is telling us that there are a total of three tokens there. 00:07:09.100 |
So each one of our samples here will need to be processed by this Flare.data.sentence 00:07:16.140 |
method before we pass it into the actual model. 00:07:22.380 |
Once we do have this, so let's call this Sample as well, we will pass it to our model for 00:07:33.580 |
prediction, which is really easy, all we need to do is call the predict method on the sample. 00:07:45.980 |
And now this doesn't output anything, instead it actually just modifies the sentence object 00:07:53.660 |
that we have produced, so it modifies Sample. 00:07:58.260 |
And we can see now that our Sample, we started a sentence and we started a number of tokens, 00:08:03.060 |
but we also have these additional labels, which are the predictions. 00:08:09.020 |
We have the label, which is positive, which means it's a happy or it's a positive sentiment. 00:08:16.140 |
And then what we have here is actually the probability or the confidence in that prediction. 00:08:25.140 |
That's great, but realistically we want to be extracting these labels. 00:08:31.360 |
So we're actually able to extract these by accessing the labels method. 00:08:39.580 |
So we have labels here and this produces the positive and the confidence. 00:08:46.160 |
To access each one of these we access index 0 followed by dot value. 00:09:03.480 |
And then we can also do the same to get the confidence, called score, like that. 00:09:13.400 |
So what we can do now is just create a simple for loop that will go through each sample 00:09:20.180 |
in our test data and assign a probability for each one. 00:09:26.960 |
So we will initially create a sentiment and confidence list. 00:09:39.120 |
Then we will just, as we are looping through the data, we will append our sentiment value, 00:09:44.880 |
so the positive or negative, and the confidence to each one of these lists. 00:10:10.820 |
So here we are first tokenizing our sentence. 00:10:18.060 |
Then we are making a prediction using that tokenized sentence, which we are calling sample. 00:10:26.740 |
And as we did before, we have now got this labeled sentence and we just need to extract 00:10:57.820 |
Okay so we can see here that one of our sentences was just blank. 00:11:02.580 |
So we will add in some logic to avoid any errors there. 00:11:30.940 |
Okay so looking at this, it's also whenever there's a space as well. 00:11:36.140 |
So we just need to trim this, which we can do easily using the strip method. 00:11:47.820 |
Okay so it took a little bit of time, but we now have our predictions. 00:11:53.180 |
So what we want to do is actually add what we have here in the sentiment and confidence 00:12:02.020 |
So to do that, we just add df sentiment to create a new sentiment column and we made 00:12:13.300 |
that equal to the sentiment list that we have created. 00:12:17.100 |
And we also do the same for confidence as well. 00:12:35.340 |
Okay so initially looking at this, it looks pretty good. 00:12:38.760 |
So intermittently pleasing, but mostly routine effort. 00:12:43.460 |
Occasionally negative, but basically saying it's occasionally okay, but generally nothing 00:12:50.260 |
So obviously it's a negative sentiment, which is matched up to negative sentiment here. 00:12:55.660 |
Here we're saying okay Kidman's the only thing that's worth watching in Birthday Girl. 00:13:01.140 |
And it says another example of the sad decline of British comedies in the post-Full Monty 00:13:10.220 |
So this one is our first positive, once you get into it, it's relevant, the movie becomes 00:13:16.340 |
Yeah, I mean it sounds pretty positive to me. 00:13:21.720 |
Even here where we're not saying anything particularly like a negative or positive word, 00:13:27.220 |
we're just saying that the movie is, or the movie delivers on the performance of striking 00:13:33.380 |
skill and depth, which must be pretty hard for a machine to understand and actually get 00:13:41.400 |
But looking at all these, it's doing really well. 00:13:44.420 |
And I think it's really cool that we can actually do this with so little effort, and we've only 00:13:50.740 |
actually written a few lines of code in reality. 00:13:54.340 |
And it's producing really good, accurate results, which is really impressive to me. 00:13:59.040 |
So that's it for this video, I hope it's been useful. 00:14:04.020 |
And thank you for watching, and I will see you again in the next one.