Back to Index

How-to do Sentiment Analysis with Flair in Python


Chapters

0:0
0:42 Install Flare
2:0 Import a Sentiment Model
8:36 Labels Method

Transcript

Hi, and welcome to this video on sentiment analysis using the Flare library. So Flare is an incredibly simple, easy-to-use library, which contains a load of pre-built models for NLP that we can simply import and use to make predictions. So it actually allows us to use some of the most powerful models out there as well.

So in this tutorial, we're going to be using the Distilbert model, which is based on a BERT, but it's a lot smaller, but almost as powerful as BERT itself. So we're going to go ahead and begin. First, if you haven't already, you need to pip install Flare. And alongside Flare, you are also going to need PyTorch.

If you haven't got PyTorch installed already, you'll need to head over to the PyTorch website. And they give you instructions on exactly what you need to install. So we come down to here and we can see, okay, for me, I have Windows. I want to install using Conda, using Python, and then CUDA.

So this is if you have a CUDA-enabled GPU on your machine. If you don't know what that means, you probably don't. So in that case, just click none. But for me, I have 10.2. So all we need to do is copy the command underneath here, and then we would run this in our Anaconda prompt.

I already have these installed, so I'm going to go ahead and actually begin coding. So we're going to need to use Pandas and also Flare. So now we have imported Flare, we can actually import a sentiment model straight away. So all we need to do is we want to pass our sentiment model to a variable, which we will call sentiment model.

And we just need to write Flare.models.textClassifier and load. And then in here, we pass the model name that we would like to load. And in our case, it will be the English sentiment model, which is en-sentiment. Okay, so now we are downloading the model. And in a moment, that will have downloaded and we can begin using it.

Now obviously, we need data. I have downloaded some data here, which is a sentiment data set based on the, I think it's the IMDB Movie Reviews. So you can find the same data set over here. Okay, so it's Sentiment Analysis on Movie Reviews data set, so it's from Rotten Tomatoes.

And you scroll down and we have the training data and test data here. I'm just going to use the test data, but we can use either. We're just going to be making predictions based on the phrase here. So we need to read in our data. So it's going to read it in as if it were a CSV file, and we will just pass a tab as our separator because we are actually working with a tab-separated file.

Okay, so here, it's actually a CSV, not CSV. Okay, so the first thing you'll notice is that we actually have duplicates of the same phrase. That is actually just how this data set is. It just contains the full phrase initially. So this first entry here is the full phrase, and then all of these following it are actually parts of that phrase.

So what we can do, so let's change it so we can actually see the full phrase first. Okay, so we can't really see that much more anyway, but that's fine. So to remove this, we just want to drop all of the duplicates whilst keeping the first instance of the sentence ID.

So you see each one of these, they all have the same sentence ID. It's actually only the first one that we need. So we just drop duplicates on this column, keeping the first entry. Okay so we're keeping the first entry, dropping duplicates from sentence ID, and we're just doing this operation in place.

Okay so now we can see each sample is now a unique entry. Okay so now our data is ready. So we need to actually first convert our text into a tokenized list using Flare. So Flare does this one sentence at a time. So if we, for example, pass Hello World into the Flare tokenizer, we will be able to see what it's actually doing.

Okay so here we can see that it's split each one of these into tokens. So we've got Hello as a token, World as a token, and then we have also split the exclamation mark at the end there. And you can see that Flare is telling us that there are a total of three tokens there.

So each one of our samples here will need to be processed by this Flare.data.sentence method before we pass it into the actual model. Once we do have this, so let's call this Sample as well, we will pass it to our model for prediction, which is really easy, all we need to do is call the predict method on the sample.

And now this doesn't output anything, instead it actually just modifies the sentence object that we have produced, so it modifies Sample. And we can see now that our Sample, we started a sentence and we started a number of tokens, but we also have these additional labels, which are the predictions.

We have the label, which is positive, which means it's a happy or it's a positive sentiment. And then what we have here is actually the probability or the confidence in that prediction. That's great, but realistically we want to be extracting these labels. So we're actually able to extract these by accessing the labels method.

So we have labels here and this produces the positive and the confidence. To access each one of these we access index 0 followed by dot value. So this will give us the positive. And then we can also do the same to get the confidence, called score, like that. So what we can do now is just create a simple for loop that will go through each sample in our test data and assign a probability for each one.

So we will initially create a sentiment and confidence list. Then we will just, as we are looping through the data, we will append our sentiment value, so the positive or negative, and the confidence to each one of these lists. So here we are first tokenizing our sentence. Then we are making a prediction using that tokenized sentence, which we are calling sample.

And as we did before, we have now got this labeled sentence and we just need to extract the two labels that we have here. Okay so we can see here that one of our sentences was just blank. So we will add in some logic to avoid any errors there.

Okay so looking at this, it's also whenever there's a space as well. So we just need to trim this, which we can do easily using the strip method. Okay so it took a little bit of time, but we now have our predictions. So what we want to do is actually add what we have here in the sentiment and confidence list to our data frame.

So to do that, we just add df sentiment to create a new sentiment column and we made that equal to the sentiment list that we have created. And we also do the same for confidence as well. Now we can see our data frame. Okay so initially looking at this, it looks pretty good.

So intermittently pleasing, but mostly routine effort. Occasionally negative, but basically saying it's occasionally okay, but generally nothing special. So obviously it's a negative sentiment, which is matched up to negative sentiment here. Here we're saying okay Kidman's the only thing that's worth watching in Birthday Girl. And it says another example of the sad decline of British comedies in the post-Full Monty world.

Fair enough. Also negative. So this one is our first positive, once you get into it, it's relevant, the movie becomes a heady experience. Yeah, I mean it sounds pretty positive to me. So it's quite good. Even here where we're not saying anything particularly like a negative or positive word, we're just saying that the movie is, or the movie delivers on the performance of striking skill and depth, which must be pretty hard for a machine to understand and actually get it right.

But looking at all these, it's doing really well. And I think it's really cool that we can actually do this with so little effort, and we've only actually written a few lines of code in reality. And it's producing really good, accurate results, which is really impressive to me. So that's it for this video, I hope it's been useful.

And thank you for watching, and I will see you again in the next one.