NER With Transformers and spaCy (Python)

00:00:00.000 | Okay, in this video, we're going to have a look at using named entity recognition

00:00:04.900 | through the spaCy library, which I've already done a video on before.

00:00:09.000 | But this time, rather than using the traditional spaCy models,

00:00:12.400 | we're going to use a transformer model.

00:00:14.700 | Now, the setup for this is pretty similar to normal spaCy.

00:00:20.500 | So the very first thing we want to do is obviously install spaCy transformers.

00:00:29.300 | So to do that, all we need to write is pip install spaCy

00:00:35.600 | and then in square brackets, transformers.

00:00:38.700 | Now, alongside this, we can also install what we need to run it with CUDA.

00:00:49.700 | So if you have CUDA, you can check your version number like this.

00:00:57.900 | Okay, and so for me, I can see that I have CUDA 11.1.

00:01:04.600 | And so what I would do, rather than just writing pip install spaCy transformers,

00:01:11.200 | I would write this, transformers.

00:01:16.100 | And then after this, I include my CUDA version.

00:01:20.200 | So I write CUDA, and I'm on 11.1, so I put 111.

00:01:27.400 | And then I'd enter this and install spaCy.

00:01:30.000 | But I already have it installed, so I'm not going to do that again.

00:01:33.300 | Now, once we have installed spaCy transformers,

00:01:38.700 | we also need to download the transformer model from spaCy.

00:01:42.500 | And to do that, we would write this.

00:01:44.700 | So we write Python, and we specify the spaCy module,

00:01:49.000 | and then we write download.

00:01:50.800 | And the transformer model is very similar to the other spaCy models

00:01:56.200 | if you've used them before.

00:01:57.300 | We have to do this for every spaCy model, not just the transformer model.

00:02:01.300 | So depending on which model you want,

00:02:06.400 | I'm going to be using the English one.

00:02:07.900 | There are other models as well for other languages.

00:02:10.800 | We're using the English core web.

00:02:15.400 | And I think there's only a web transformer model,

00:02:18.700 | but I'm not 100% sure on that, so don't take my word for that.

00:02:23.600 | And then the transformer part of this is trf at the end.

00:02:27.400 | So usually the models we typically use are sm for small and lg for large.

00:02:33.500 | This time we're using the transformer version of that model.

00:02:37.100 | Again, I already have it installed, so I'm not going to run it again.

00:02:41.000 | And now to start using this,

00:02:46.100 | we do what we would normally do with spaCy.

00:02:49.100 | So all we actually need to do is import spaCy.

00:02:53.900 | And I'm also going to import displaCy

00:02:56.400 | so that I can visualize the named entity recognition.

00:03:00.300 | So from spaCy import displaCy.

00:03:04.000 | And then what we do is this is almost exactly the same

00:03:12.300 | as what we always do with spaCy.

00:03:16.200 | There's really no difference here.

00:03:18.100 | So we do nlp spaCy.load.

00:03:22.200 | And then here we include our model name.

00:03:25.200 | So usually what we do is write encore web sm or lg.

00:03:30.900 | This time we're going to write trf for transformer.

00:03:33.800 | Run that.

00:03:37.300 | And once we've loaded that,

00:03:39.900 | we can form named entity recognition as we normally would.

00:03:44.200 | So what we do is doc nlp.

00:03:49.100 | And then in here we pass our tex.

00:03:51.600 | And then we want to visualize that.

00:03:57.400 | And we can do that in Jupyter just like this.

00:04:02.400 | So render.

00:04:03.800 | And then we pass in the document or the doc that we just created.

00:04:10.400 | And we want the style to be entities

00:04:14.200 | because we're doing n-e-r here.

00:04:15.600 | Okay, so there we have our trf model.

00:04:21.800 | Now I'm going to swap these around

00:04:25.400 | because I want to compare this to what we would normally use,

00:04:29.100 | which is the...

00:04:30.000 | Let's use the large model

00:04:32.800 | because it's probably the closest match

00:04:34.900 | to the transformer model.

00:04:39.200 | And I'll just call this lg.

00:04:40.600 | And modify this to be lg as well.

00:04:44.800 | And what we'll do is just do the same thing again here.

00:04:50.600 | And here.

00:04:54.400 | And rerun that.

00:04:57.200 | And you can see that in this case,

00:04:59.300 | there's actually no difference.

00:05:00.400 | Now, I think this is to be expected

00:05:03.400 | because the traditional spaCy model,

00:05:07.900 | especially the large model,

00:05:09.000 | is still really good in terms of performance.

00:05:11.800 | So in most cases, or at least a fair few cases,

00:05:17.500 | we shouldn't really see that much difference

00:05:19.600 | because this is still a pretty solid model.

00:05:22.200 | And we can see this as well here.

00:05:26.400 | So this is a longer text.

00:05:30.300 | I got this from the investing subreddit.

00:05:33.700 | And let's do the same thing again.

00:05:36.400 | So we're going to take the transformer model.

00:05:40.300 | Also the traditional model.

00:05:44.200 | We'll convert both of those to dots.

00:05:51.100 | So just do it like this.

00:05:53.800 | Let's try and keep it reasonably tidy.

00:05:59.500 | Okay, and then we just want this.

00:06:07.000 | And this again.

00:06:09.000 | Okay.

00:06:12.100 | Now, again, with this one,

00:06:14.900 | we don't see any difference

00:06:16.700 | because, like I said,

00:06:18.400 | this traditional spaCy model is still pretty good.

00:06:22.800 | But what we can do is start adding more complex

00:06:28.500 | or at least longer pieces of text.

00:06:31.200 | And then let's compare how they perform.

00:06:33.400 | Because in this case,

00:06:34.500 | the transformer model will outperform the traditional model.

00:06:37.800 | Okay, so this text is quite a bit longer.

00:06:41.500 | And now let me copy these two.

00:06:46.500 | Bring them down here.

00:06:48.500 | And let's do that again.

00:06:53.200 | I'm not sure why.

00:06:54.100 | Sorry, I'm reloading the model that I didn't realize.

00:06:58.100 | Okay.

00:06:59.400 | And here and here.

00:07:02.400 | Now let's compare these two.

00:07:05.400 | So the transformer model is correctly identifying

00:07:10.100 | Fastly as an organization.

00:07:11.900 | The large model is identifying Fastly as a person.

00:07:15.900 | So you can see a little bit of difference there.

00:07:18.500 | Then up here,

00:07:21.600 | we have this quarter one 21

00:07:24.700 | identified as a date by a transform model.

00:07:27.100 | Not at all with the large traditional model.

00:07:32.100 | Go a little bit further.

00:07:33.200 | So this one, I'm not really sure if this is a good thing or bad thing.

00:07:36.600 | So the transformer model is identifying a whopping 27%

00:07:40.700 | as the percent.

00:07:42.900 | Which I don't think that's really a good thing.

00:07:48.600 | But it depends on what you're doing.

00:07:50.300 | Maybe you kind of want that exaggeration of the percentage in there.

00:07:54.800 | But I don't think so.

00:07:56.600 | So I suppose that depends on what you're doing.

00:07:59.800 | So I suppose that depends.

00:08:02.400 | But I would say probably transform model is actually performing worse

00:08:06.400 | with that single, that single use, that single case.

00:08:10.500 | The rest of these, the money percentages,

00:08:13.600 | they're all matching between the two models.

00:08:15.900 | Here, the large model is pulling out this one cardinal.

00:08:21.600 | I'm not sure we really want that one in there to be honest.

00:08:29.300 | And then if we continue,

00:08:31.100 | the transform model is getting the Q1 date

00:08:33.800 | and the facility organization down here.

00:08:36.000 | Usual model, missing Q1.

00:08:38.600 | We do have facilities person.

00:08:40.700 | And then down here,

00:08:42.800 | the large model is seeing CFO as an organization,

00:08:47.200 | which is obviously not the case.

00:08:48.300 | It's actually a substitution, obviously.

00:08:50.700 | So, yeah, I mean, they're not far from each other for sure.

00:08:55.300 | But I think in this case,

00:08:56.900 | transform model is definitely outperforming,

00:09:00.000 | at least by a little bit.

00:09:01.700 | And I think the more complex the language gets,

00:09:05.500 | the better the transform model can form.

00:09:10.200 | So, I mean, that's it for this video.

00:09:13.500 | I just wanted to give you a quick demonstration

00:09:15.200 | of spaCy and transform,

00:09:17.600 | which I think is really cool

00:09:18.800 | that you can actually use both of those packages together.

00:09:22.000 | So I think thank you very much for watching

00:09:25.400 | and I'll see you again in the next one.